Talk to me
Talk to me — speak, type, listen

Your BYOK Voice Interaction Studio for Windows.

Speak into any app. Listen to any text. Keep your keys local. Full control, zero lock-in.

Talk to me — Voice Interaction Studio

No sign-up. No telemetry. Your keys never leave your PC.

Built for Creators Who Mean Business

Built for Creators

Content Creators

  • Produce audio, voiceovers, scripts, podcasts, and video content
  • Voice cloning for consistent brand voice
  • Auto-recording saves MP3 files for direct use
  • TTS as a full production workstation

Professionals

  • Live in text fields: tickets, docs, emails, Notion, IDEs, CRMs
  • Dictate and auto-paste without switching windows
  • Focus tracking restores exact target window
  • Hotkeys work system-wide, even in fullscreen

Teams & Enterprises

  • Key control, provider choice, and predictable reliability
  • Multi-key pools with trust scoring and auto-failover
  • Usage dashboard tracks STT/LLM/TTS volume
  • No vendor lock-in — switch providers anytime

Privacy-First Users

  • Voice workflows without handing keys to a random cloud
  • Zero-Knowledge architecture: nothing leaves your machine
  • DPAPI encryption at rest (Windows user-scoped)
  • No accounts, no telemetry, no tracking

Bottom line: If you want a toy, there are hundreds. If you want control and reliability, this is the one.

↑ Back to top

What It Is

Talk to me combines professional Speech-to-Text (STT) and Text-to-Speech (TTS) in one workflow — powered by your own API keys (BYOK).

It is not a dictation toy. It is not a single-provider client. It is a production-grade voice I/O layer that plugs into your entire Windows workflow.

Most voice tools make you work inside their UI. Talk to me works inside yours.

↑ Back to top

Two Core Modes

STT and TTS — Two Core Modes

Mode 1 — Speech-to-Text

Speak → transcribe → clean → optional AI polish → inject into the active app.

Capture

  • Global hotkey starts from any application
  • WASAPI recording for lowest latency
  • Real-time audio level meter
  • Voice Activity Detection (VAD)

Transcribe & Clean

  • OpenAI Whisper or Deepgram Nova-2
  • 18 languages + auto-detect
  • Smart cleanup: whitespace, caps, punctuation
  • Optional AI-Polish (grammar only, content preserved)

Inject

  • Auto-paste into target via Ctrl+V
  • Focus tracking restores exact window
  • Text always safe in clipboard
  • Persistent modal if no target detected

AI-Polish Providers

  • OpenAI (GPT-4o-mini, GPT-4o, GPT-4.5)
  • Groq, Anthropic, Google Gemini
  • xAI Grok
  • Strict mode: grammar only, no rephrasing

Mode 2 — Text-to-Speech

Select text → synthesize → listen → optionally auto-save audio.

TTS Engine — Voice Workstation

ElevenLabs Models

  • Eleven v3 — Audio Tags (laughter, sigh, etc.)
  • Multilingual v2 — 29 languages, stable
  • Flash v2.5 — ~75ms latency, fastest
  • Turbo v2.5 — quality/speed balance

5 Voice Parameters

  • Stability (0.0–1.0)
  • Similarity Boost (0.0–1.0)
  • Style / Expressiveness (0.0–1.0)
  • Speed (0.7–1.2) + Speaker Boost

Smart Chunking

  • Up to 40,000 chars per request
  • Prefetch queue: next chunk while current plays
  • Paragraph → sentence → word splitting
  • Progress display: Chunk X/Y

Auto-Recording

  • Saves MP3 files sequentially
  • Custom directory via native picker
  • 4 quality presets (32–192 kbps)
  • Code filter removes code blocks
OpenAI Whisper Deepgram ElevenLabs OpenAI TTS GPT-4o Claude Gemini Groq Grok

Bottom line: Two modes, one studio. STT captures your voice. TTS speaks your text. Both powered by your keys.

↑ Back to top

BYOK + Multi-Key Pool Architecture

Multi-Key Pool Architecture

Bring Your Own Keys means you keep control over providers, quotas, and cost. Talk to me is built around a multi-key pool system designed for real-world reliability.

STT Pool (5 slots)

  • OpenAI Whisper, Deepgram
  • Per-key trust scoring & cooldown
  • Auto-rotation on failure
  • Cross-pool migration (OpenAI)

LLM Pool (5 slots)

  • OpenAI, Groq, Anthropic, Gemini, xAI
  • OpenAI keys auto-shared with STT pool
  • Highest-trust key selected automatically
  • 10+ models across 5 providers

TTS Pool (5 slots)

  • ElevenLabs, OpenAI TTS
  • Test all keys with one click
  • Response time measurement
  • Enable/disable per key

Trust Scoring

  • Keys start at 50% — must earn trust
  • Failures & rate limits reduce score
  • 60s cooldown on rate limit (429)
  • Full metrics: successes, failures, timestamps

Bottom line: You're not buying API usage. You're buying an engine that turns your keys into a self-healing voice pipeline.

↑ Back to top

Zero-Knowledge. Zero-Trust. Zero Drama.

Zero-Knowledge Security

Zero-Knowledge

  • API keys never leave your machine
  • No cloud storage, no server-side vault
  • No telemetry, no analytics, no tracking
  • No accounts, no registration, no login

DPAPI Encryption

  • Windows Data Protection API (user-scoped)
  • Keys encrypted at rest, decrypted in memory only
  • Bound to your Windows user account
  • Plaintext keys auto-migrated on first load

Zero-Trust

  • Every key independently validated
  • Continuous trust scoring — no implicit trust
  • Auth failures heavily penalized
  • New keys start at 50% — must prove themselves

Verifiable

  • No hidden network calls — check your firewall
  • Settings stored as plain JSON in %APPDATA%
  • Open architecture — nothing obfuscated
  • Your keys are power. Keep them local.

Bottom line: We built a system where your keys never leave your machine. That's not a promise — that's architecture.

↑ Back to top

Workflow Acceleration

Workflow Acceleration

Floating Audio Pill

  • Tiny always-on-top visualizer (168×63px)
  • 24-bar equalizer with voice-reactive animation
  • True per-pixel alpha transparency (DWM API)
  • Multi-monitor: shows where your cursor is

Global Hotkeys

  • 11 dictation hotkey options incl. low-level Ctrl+Win
  • 6 TTS hotkeys (read clipboard aloud)
  • Runtime changes without restart
  • 400ms debounce + zero-swallow behavior

Focus Tracking

  • Captures target window before dictation
  • Restores exact focus after pipeline
  • Detects own window (prevents self-injection)
  • Auto-paste via simulated Ctrl+V

Smart Clipboard

  • Text always placed in clipboard (safety net)
  • Persistent modal when no target detected
  • Auto-tab-switch when recording in TTS mode
  • Auto-scroll to recording section

Bottom line: Your text goes where you need it. Your workflow stays uninterrupted. That's the point.

↑ Back to top

Voice Library + Voice Cloning

Voice Library and Cloning

Voice Library

  • Loads your full ElevenLabs library
  • Custom/cloned voices prioritized
  • Preview URLs for auditioning
  • Direct Voice-ID input
  • Cached for fast access

Voice Cloning

  • Clone directly from the app
  • MP3, WAV, M4A, OGG, FLAC supported
  • Custom voice name assignment
  • Instantly available after cloning
  • Auto-selected after creation

Parameter Tuning

  • Stability: emotional vs. monotone
  • Similarity Boost: match original voice
  • Style: expressiveness control
  • Speed: 0.7× – 1.2×
  • Speaker Boost: enhanced clarity

Bottom line: Your voices, your clones, your parameters. Talk to me is a TTS workstation, not a "read aloud" button.

↑ Back to top

Designed Like a Pro Tool

Professional UI Design

Design Language

  • Text-Display-Fields (TDFs) for all status elements
  • Cyan identity accent (#06b6d4)
  • Lucide SVG icon set (no emoji UI)
  • Deep navy background (#0e213b)

Usage Dashboard

  • STT: call count + cumulative minutes
  • LLM: call count + cumulative characters
  • TTS: call count + cumulative characters
  • Persistent across sessions

TTS Status Log

  • Real-time event log with timestamps
  • Synthesis status, playback events, errors
  • Chunk progress for long-text processing
  • Last 50 entries displayed

Auto-Update

  • Automatic check on startup
  • GitHub Releases integration
  • One-click in-app install
  • Signature-verified updates

Bottom line: No accounts, no telemetry, no vendor lock-in. Just a reliable voice studio that plugs into your workflow.

↑ Back to top

Get Talk to me

Coming Soon

Pricing details will be announced before launch. Join the waitlist to be notified.

Get Notified

Download

Windows

Native Windows app. Tauri 2.0 (Rust + Svelte). Lightweight, fast, secure.

Download for Windows

BYOK

Bring Your Own Keys

Use your existing API keys from OpenAI, ElevenLabs, Deepgram, Groq, Anthropic, Gemini, xAI.

See Supported Providers

Bottom line: Talk to me is voice I/O for Windows — STT in, TTS out — powered by your keys and secured by design.

↑ Back to top

Frequently Asked Questions

Do you store my API keys?

No. Keys are stored locally only and encrypted with Windows DPAPI. They never leave your machine.

Do I need an account?

No. There is no registration, no login, and no telemetry.

Which providers do you support?

STT: OpenAI Whisper, Deepgram. LLM Polish: OpenAI, Groq, Anthropic, Gemini, xAI. TTS: ElevenLabs and OpenAI TTS.

Can I use multiple keys?

Yes. Up to 15 keys across three pools. Talk to me auto-rotates and fails over based on trust and cooldown.

What happens if injection fails?

Your text is always placed in the clipboard. If no external target is detected, you get a persistent clipboard notification with an OK button.

Does AI-Polish rewrite my words?

No. Polish is strict: grammar and spelling corrections only, content preserved. It can also be disabled entirely.

Is it only dictation?

No. TTS is a full workstation: models, voices, cloning, tuning, chunking, prefetch, and auto-recording. That's why it's a Voice Interaction Studio, not a dictation tool.

Talk to me exists to turn voice interaction into an engineered process.

Download for Windows

No sign-up. No telemetry. Your keys never leave your PC.