Talk to me — speak, type, listen

Talk to me — User Manual

Version: 0.5.149 (Windows Desktop) / 0.5.157 (Android Hands-Free) Last Updated: 2026-04-20
This manual covers both the Windows Desktop and Android Hands-Free editions of Talk to me. Sections marked with Windows or Android apply only to that platform. All other sections apply to both.

1. Introduction

Talk to me is a professional dictation, translation, and voice interaction studio available for Windows Desktop and Android. It converts your speech into text, polishes it with AI, translates it into 20+ languages, and reads it back to you — all in real time.

The app follows a strict BYOK (Bring Your Own Key) and Zero-Knowledge / Zero-Trust architecture: your API keys and data never leave your device.

Key Features

  • Real-time Dictation: Record your voice and get polished text in seconds.
  • AI-Polish: Automatic grammar correction and filler word removal powered by your choice of AI provider.
  • Live Translation: Translate dictated text into 20+ languages on the fly.
  • Voice Translate (Speech-to-Speech): Your translated text is automatically read aloud in the target language.
  • Text-to-Speech: Convert any text into natural-sounding speech with ElevenLabs, OpenAI TTS, or Deepgram.
  • Live Language Immersion: Speak in your native language, instantly see and hear it in the language you want to master.
  • Word Corrections: Teach the app your names, brands, and terms that speech recognition gets wrong.
  • Encrypted Backup: Export all settings and API keys as a password-protected encrypted file.
  • Multi-Provider Support: Choose from OpenAI, Groq, Anthropic, Google Gemini, xAI Grok, ElevenLabs, Deepgram, and more.

Platform Highlights

Feature Windows Desktop Android Hands-Free
Mini-Player (compact mode)
Global Hotkeys (Ctrl+Win)
Auto-Read (Ctrl+C text extraction)
Notification Listener
MP3 Recording & Save
Floating Pill (Spectrum Analyzer)
Floating Bubble (Overlay)
Auto-Paste (Accessibility)
Auto-Read Messages (from chat apps)
App-level Notification Access

Security Principles

  • Zero-Knowledge: Talk to me never stores, transmits, or has access to your API keys on any server. All keys are stored locally on your device.
  • Zero-Trust: The app never phones home. No analytics, no tracking, no telemetry. Your dictation data flows directly from your device to your chosen AI provider and nowhere else.
  • BYOK: You bring your own API keys from the providers you trust. Talk to me does not resell API access.

2. Getting Started

Windows Installation — Windows Desktop

Talk to me for Windows is available as an EV-signed installer from talktome.studio or via the Microsoft Store.

System Requirements:

  • Windows 10 or later (64-bit)
  • An active internet connection
  • At least one API key from a supported provider

The installer is digitally signed with an Extended Validation (EV) certificate from Certum (mrocon GmbH). Windows SmartScreen will not show any warnings.

Android Installation — Android

Talk to me for Android is available as an APK from talktome.studio or via the Google Play Store.

System Requirements:

  • Android 8.0 or later
  • An active internet connection
  • At least one API key from a supported provider

First Launch

When you open Talk to me for the first time, you will see the License Gate. You have two options:

  1. Enter a License Key to unlock the full app immediately.
  2. Start a 7-Day Free Trial to explore all features without a license key.

After activation or trial start, the app loads and you can begin using it right away — provided you have at least one API key configured (see Key Pool).

Android Quick Start — Your First 5 Minutes

After activating your license (or starting the free trial), the app opens and you will see the main screen — the Cockpit. Don't worry if most buttons appear orange or inactive. That's completely normal! Here is what to do, step by step:

Step 1 — Enable Microphone Access

The large button in the center of the screen reads "Enable Microphone Access". This is the first and most important step.

  1. Tap the Enable Microphone Access button.
  2. A dialog from Talk to me explains why the microphone is needed. Tap OK.
  3. Android then asks: "Allow Talk to me to record audio?" — tap While using the app (or Allow).
  4. Done! The button changes to "Ready — Start Dictation" in green. You can now record your first dictation.

Step 2 — Add Your API Keys

At the bottom of the screen you will see the Key Pool bar — probably showing red labels like STT 0/5, LLM 0/5, TTS 0/5. This means no API keys are configured yet. Without keys, the app cannot connect to AI services.

  1. Tap any of the Key Pool labels (e.g. STT) to open the Key Pool section.
  2. Tap Add Key and paste an API key from your provider (e.g. OpenAI, Deepgram, ElevenLabs).
  3. Tap Save. The label turns green when a valid key is stored.
  4. Repeat for each category you want to use. At minimum, you need an STT key (for dictation). For AI polish, add an LLM key. For text-to-speech, add a TTS key.

See §11 Key Pool for a detailed guide on supported providers and how to obtain API keys.

Step 3 — Optional Features (Cockpit Buttons)

The buttons in the center of the Cockpit control optional features. Each one requires a system permission the first time you enable it. You will see a short explanation dialog from Talk to me, followed by the Android system dialog. Both are normal and safe to confirm.

ButtonWhat it doesDetails
Auto-Paste Automatically pastes your dictated text into whichever app you were using (e.g. WhatsApp, email). No manual copy-paste needed. §19
Notif Access Lets the app read incoming notifications so it can auto-read messages to you. §21
Auto-Read Reads incoming messages aloud using text-to-speech — great for hands-free use while driving or cooking. §20
Overlay Shows a small floating bubble on your screen. Tap it to start/stop dictation from any app — without switching back to Talk to me. §18

You don't need all of these right away. Start with dictation (Step 1 + 2), and enable the extras whenever you're ready. Each feature can be turned on or off at any time.

Free & Paid Tier Overview

Talk to me is a BYOK app (Bring Your Own Key). You use your own API keys from AI providers. Many providers offer generous free tiers — from $200 Deepgram credit to unlimited Gemini usage to free Grok and Groq keys. This means you can use Talk to me for months before any API costs arise.

Tier 1 — Completely Free (no money, no credit card)

What you needWhat you getHow to get it
1× Deepgram account (free)Speech-to-Text dictation (STT)deepgram.com → Sign up → $200 starter credit
1× Gemini API key (free)AI Voice Chat (Gemini Live)aistudio.google.com → Create API Key

What you can do:

  • Dictate with Deepgram Nova-3 (preset “Free”) — no LLM polish, but solid transcription
  • AI Voice Chat via the Gemini Live tab — real-time voice conversation with sub-second latency, 30 voices, 24 languages

How long does it last?

FeatureCredit / LimitLasts for
Deepgram STT$200 starter credit (never expires)~43,000 min (~716 hours) transcription
Gemini Live Voice ChatFree API key (no credit limit)Unlimited (rate limit: ~10 sessions/min)
Gemini LLM (for Polish)Free API key250 requests/day (Flash model)

Reality: With these two free accounts you can use Talk to me productively for months. During intensive daily testing, only $19 of $200 Deepgram credit was used after weeks.

Tier 2 — Free with More Power (additional free keys)

What you needWhat it addsCost
+ 1× xAI accountGrok-3-Mini as LLM for Polish + TranslationFree ($25 starter credit + up to $150/month with data sharing)
+ 1× Groq accountUltra-fast LLM for Polish (Llama models)Free (1,000 requests/day, no credit card)

Unlocked presets:

PresetSTTLLM / PolishAll keys free?
FreeDeepgram Nova-3Yes (1 key)
Free xAIDeepgram Nova-3xAI GrokYes (2 keys)
Free GeminiDeepgram Nova-3Google GeminiYes (2 keys)
Fast FreeOpenAI WhisperGroq LlamaYes (2 keys)
EconomyDeepgram Nova-3Groq LlamaYes (2 keys)
Economy PlusDeepgram Nova-3Groq Llama (Strong Polish)Yes (2 keys)

Also unlocked:

  • Deepgram Voice Agent with 20+ managed presets (uses your $200 credit, $0.05–0.16/min)
  • Full BYO Voice Agent Presets (e.g. GPT-5.4 + ElevenLabs, if you have the keys)

Tier 3 — Premium Quality (paid keys)

For the absolute best quality, you need paid API keys:

ProviderUsed forCostWhat you get
OpenAIGPT-5.4 (best LLM for Polish)Pay-per-use (~$5–15/month)Perfect grammar, style, translation
ElevenLabsScribe v2 (best STT) + TTSFrom $5/month (Starter)Best transcription, premium voices
AnthropicClaude 4.6 Sonnet (top LLM)Pay-per-useExcellent text quality for longer texts

API Key Cost Overview

ProviderSign upStarter creditOngoing costCredit card?
DeepgramFree$200 (never expires!)From $0.0043/min STTNo
Google GeminiFreeUnlimited (rate-limited)$0.005–0.018/min (Live Audio)No
xAI (Grok)Free$25 + up to $150/monthFrom $0.10/1M tokensNo
GroqFreeUnlimited (rate-limited)1,000 requests/day freeNo
OpenAIFree$5 (expires after 3 months)From $0.15/1M tokensYes (for GPT-5+)
AnthropicFree$5 (expires after 30 days)From $1.00/1M tokensYes
ElevenLabsFree10,000 chars/monthFrom $5/month (Starter)Yes

Recommended Start (3 minutes, $0 cost)

  1. Create Deepgram account → deepgram.com → Sign up → Copy API Key
  2. Create Gemini API key → aistudio.google.com → “Create API Key” → Copy key
  3. Enter keys in Talk to me → Settings → LLM Key Pool
  4. Go: Dictation tab → preset “Free Gemini” → Dictate with STT + AI Polish. Gemini Live tab → “Start Conversation” → Real-time voice chat with AI.

Optional for even more:

  1. xAI account → x.ai/api → Sign up → API Key → Enter in Key Pool → preset “Free xAI”
  2. Groq account → console.groq.com → Sign up → API Key → presets “Economy” / “Economy Plus” / “Fast Free”

Feature Availability by Tier

FeatureTier 1 (free)Tier 2 (free+)Tier 3 (premium)
Speech dictation (STT)✓ Deepgram✓ Deepgram + Whisper✓ + ElevenLabs Scribe v2
AI Polish (grammar)✓ Grok/Gemini/Groq✓ + GPT-5.4 / Claude 4.6
Real-time translation✓ (all LLM providers)✓ (best quality)
Gemini Live Voice Chat✓ (unlimited)✓ (unlimited)✓ (unlimited)
Deepgram Voice Agent✓ (from $200 credit)✓ (all presets)
BYO Voice Agent Presets✓ (with xAI/Groq keys)✓ (+ ElevenLabs/OpenAI TTS)
Available presets26+ dictation + 20+ Voice AgentAll (30+)

All prices and free tier conditions are set by the respective providers and may change. Last updated: April 2026.

3. License Activation

The License Gate

On first launch (or after trial expiration), the License Gate is displayed. It shows:

  • The Talk to me wordmark
  • A text field for your license key (format: TTM-XXXX-XXXX-XXXX-XXXX)
  • Your Machine ID (a unique device identifier, needed for activation)
  • An Activate button
  • A Start 7-Day Free Trial button (if no trial has been used)
  • Links to Buy a License and the Customer Portal

Activating a License

  1. Enter your license key in the text field.
  2. Tap/click Activate.
  3. The app verifies your key online and activates it for this device.
  4. Once activated, you will not see the License Gate again unless you deactivate or your license expires.

The Free Trial

  • Tap/click Start 7-Day Free Trial to unlock all features for 7 days.
  • A banner at the top of the app shows how many trial days remain.
  • After 7 days, the trial expires and the License Gate reappears.

License Modal

Once inside the app, you can view your license status by clicking the License button (shield icon). The License Modal shows:

  • Status: Active, Trial, Grace Period, or Expired
  • Product: Your license product name
  • Plan: Yearly or Lifetime
  • Expires: Expiration date (or "Lifetime")
  • Devices: Number of active devices / maximum allowed
  • Key: Your license key (partially masked)
  • Machine ID: Your device's unique identifier

From this modal you can:

  • Deactivate Device — releases the license from this device so you can use it on another
  • Close — return to the app

4. App Overview

The app is organized into three main tabs and several supporting sections:

Navigation

At the top of the screen, three tabs let you switch between the app's primary modes:

  • Speech-to-Text — Record your voice and get polished, translated text
  • Text-to-Speech — Convert written text into spoken audio
  • AI Voice Chat — Have real-time voice conversations with AI (see §12)

Interface Layout

Below the tabs, the main interface is arranged vertically:

  1. Quick-Override Controls — Language selectors for input and output
  2. Action Buttons — Quick access to platform features
  3. Status Indicator — Shows the current state (Ready, Recording, Transcribing, etc.)
  4. Pipeline Display — Visual progress of your dictation through the processing stages
  5. Result Area — Your transcribed/translated text
  6. TTS Panel (Text-to-Speech tab only) — Text input and playback controls
  7. AI Voice Chat Panel (AI Voice Chat tab only) — Voice/persona selection, conversation controls, live transcript (see §12)
  8. Key Pool — Manage your API keys
  9. Settings — All configuration options

Action Buttons

Windows Desktop action buttons:

  • Voice Translate — Toggle speech-to-speech translation
  • Notification Listener — Toggle notification readout
  • Auto-Read — Toggle Ctrl+C text-to-speech
  • Record TTS Readings — Toggle MP3 recording of TTS output
  • Save Recordings — Open recordings folder

Android action buttons:

  • License — Open license modal
  • Voice Translate — Toggle speech-to-speech translation
  • Overlay — Start/stop the Floating Bubble
  • Auto-Paste — Open Accessibility settings
  • Auto-Read — Toggle auto-read messages
  • Notif Access — Open notification listener settings

The Info Button

In the header, the Info button opens the App Info modal, which displays:

  • A link to talktome.studio
  • The support email (tap/click to copy)
  • The current app version
  • Number of detected microphones

5. Speech-to-Text

The Speech-to-Text tab is the primary mode of Talk to me. Here, you record your voice and receive polished, optionally translated text.

Recording a Dictation

  1. Ensure the status shows Ready — Start Dictation (green).
  2. Click/tap the large Start Dictation button.
  3. The button turns red and shows Stop Recording. Speak clearly.
  4. While recording, you can see: Recording duration in seconds, Audio level meter showing input volume, the currently active STT provider and language.
  5. Click/tap the button again to Stop Recording.

Windows You can also start/stop recording using the global hotkey Ctrl+Win (no need to focus the app window).

What Happens After Recording

After you stop recording, the app processes your audio through the Pipeline (see The Pipeline):

  1. Capture — Audio recording is finalized
  2. STT — Your audio is transcribed by the selected provider
  3. Post-Processing — The raw text is cleaned up (word corrections applied)
  4. Polish / Translation — If enabled, AI corrects grammar or translates the text
  5. Inject — The final text is placed in your clipboard

Windows The text is automatically pasted into the previously focused window via simulated Ctrl+V (Smart Clipboard Injection).

Android If Auto-Paste is enabled, the text is automatically inserted into the active text field via the Accessibility Service.

The Result Area

After processing, your text appears in the result area. A hint confirms the text has been copied to your clipboard and is ready to paste.

Recording Signals (Audio Cues)

Talk to me signals you acoustically and visually when the microphone is actually recording — so no words are lost.

Acoustic Signals

  • Start beep (short high blip): "Microphone is live, you can speak now."
  • Stop beep (short low blip): "Recording ended."

Both beeps can be toggled on/off in the settings and their volume can be adjusted (default: 100%).

Visual Signals

  • Idle/Standby: Microphone icon is orange — recording inactive.
  • Recording active: Microphone icon is green — every spoken word is being captured.

Note: Start Beep on Speakerphones

Some audio devices suppress the start beep. This is not a bug but a hardware characteristic:

Device TypeBeep Audible?Recommendation
Speakers + separate microphone✅ Yes
Headset with separate mic + speaker✅ Yes
USB speakerphone (Jabra Speak2, Logitech P710e etc.)⚠️ Possibly notUse headset or external speakers
Bluetooth headset in Hands-Free profile⚠️ Possibly notWired headset as alternative

Important: If you change the default audio device, restart Talk to me so the beep plays on the new device.

6. Text-to-Speech

The Text-to-Speech tab lets you convert any written text into natural-sounding speech.

Basic Usage

  1. Switch to the Text-to-Speech tab.
  2. Type or paste text into the text area.
  3. Click/tap Read Aloud to start playback.

Playback Controls

  • Pause — Temporarily stops playback
  • Resume — Continues from where you paused
  • Stop — Ends playback entirely
  • Replay — Plays the same audio again without re-synthesizing

Provider and Voice Selection

  • ElevenLabs: Choose from your available voices or use "Default (Brian v3)". Custom Voice-IDs supported.
  • OpenAI TTS: Nova, Alloy, Echo, Fable, Onyx, Shimmer
  • Deepgram Aura 2: Fast synthesis

Model Selection (ElevenLabs)

ModelCharacter LimitBest For
Eleven v35,000Highest quality, short content
Multilingual v210,000Multi-language support
Flash v2.540,000Fast synthesis, long texts
Turbo v2.540,000Speed and quality balance

Audio Quality

QualityDescription
MP3 192 kbpsCreator quality — highest fidelity
MP3 128 kbpsStandard — good balance
MP3 64 kbpsCompact — smaller file size
MP3 32 kbpsMinimal — lowest quality

Text Normalization

SettingDescription
AutoThe model decides how to handle numbers
Always OnNumbers converted to words (e.g., "42" → "forty-two")
OffNo normalization applied

Voice Fine-Tuning (ElevenLabs)

SliderRangeDescription
StabilityVariable ↔ StableLower = more expressive; Higher = more consistent
SimilarityCreative ↔ OriginalHow closely the output matches the original voice
StyleNeutral ↔ ExpressiveAmount of emotional expression
SpeedSlow (0.7×) ↔ Fast (1.2×)Playback speed

Additional Options

  • Code-Filter: Strips code blocks and technical syntax before synthesis.
  • Auto-Record: Automatically saves synthesized audio. Tap the folder icon to choose the directory.
  • Speaker Boost: Enhances voice clarity (ElevenLabs only).

7. The Pipeline

The Pipeline is Talk to me's core processing engine. It visualizes the stages your audio passes through from recording to final output.

Pipeline Stages

StageLabelDescription
1CaptureAudio recording and finalization
2STTSpeech-to-Text transcription
3PostPost-processing (cleanup, word corrections)
4Polish or TransAI-Polish or AI-Translate
5InjectText copied to clipboard / auto-pasted

TDF (Text Display Field) Indicators

Each pipeline stage shows the active provider (e.g., "Scribe v2", "GPT-5.4") and timing information after completion.

Timing Display

After processing, a timing line shows:

STT 1.2s → LLM 0.8s → Inject 0.1s → Total 2.1s

If Voice Translate is active, an additional S2S (Speech-to-Speech) timing is shown.

8. Voice Translate

Voice Translate combines AI-Translation with Text-to-Speech to create a real-time speech-to-speech translation experience.

New since v0.5.150: Text translation is now automatically active whenever your input language (Speech Input) and output language (Text Output) differ. You no longer need a separate switch for text translation. The Voice Translate button now only controls whether the final text is read aloud (text-to-speech output).

How It Works

  1. Enable Voice Translate (purple when active).
  2. Record a dictation in your source language.
  3. The app transcribes → translates → reads the translation aloud.

Examples

  • DE → EN without Voice Translate: You speak German, receive English text — no audio output.
  • DE → EN with Voice Translate: You speak German, receive English text — and it is read aloud.
  • DE → DE with Voice Translate: Same language, no translation — but the text is read aloud.

Configuration

  • Target Language: Set in Settings → AI-Translate → Translate To
  • TTS Voice: Uses your configured TTS provider and voice

Use Cases

  • Travel: Speak in your language, have the translation read aloud.
  • Language Learning: Hear how your text sounds in another language.
  • Live Language Immersion: Turn your own thoughts into live fluency — speak in your native language and absorb the output in the language you want to master.

9. AI Polish & Translation

AI-Polish

When enabled, AI-Polish corrects grammar, punctuation, and (with "Strong" setting) removes filler words like "um", "uh", "you know", "basically".

Polish Strength:

  • Light — Grammar and punctuation correction only
  • Strong — Also removes filler words

Status indicators:

  • POLISH (cyan) — Active
  • OFF — Disabled
  • KEY MISSING (yellow) — No LLM key configured

AI-Translate

When enabled, your dictated text is translated into the target language.

Status indicators:

  • TRANSLATE (cyan) — Active, showing target language
  • VOICE OUTPUT (purple) — Voice Translate also active
  • TEXT ONLY — Translation without voice output
  • OFF — Disabled
Note: Since v0.5.150, Talk to me automatically detects when input and output languages differ and activates translation — without an explicit toggle. AI Polish remains independently available and is no longer automatically disabled.

10. Quick-Override Controls

The Quick-Override controls allow you to temporarily change the input or output language for a single dictation without modifying your saved settings.

Speech Input Override

Select a different input language for the next recording:

  • Auto-Detect — The STT provider detects the language automatically
  • Individual languages (see Appendix A)

Text Output Override

Select a different output language (equivalent to temporarily enabling translation):

  • Default (same as input) — No translation
  • All 20 translation languages

Reset to Settings

When an override is active, a Reset button (↩ icon) appears. Tap/click it to revert to your saved settings.

11. Key Pool

The Key Pool is where you manage your API keys. Talk to me uses a pool-based architecture — you can add multiple keys per category, and the app automatically rotates between them based on trust scores.

Categories

CategoryPurposeSupported Providers
Speech-to-TextTranscriptionOpenAI Whisper, Deepgram Nova, ElevenLabs Scribe v2, Groq Whisper
AI-Polish / LLMGrammar, translationOpenAI, Groq, Anthropic, Google Gemini, xAI Grok
Text-to-SpeechVoice synthesisElevenLabs, Deepgram, OpenAI TTS

Adding a Key

  1. Expand the Key Pool section.
  2. Click/tap + Add Key in the desired category.
  3. Select the Provider.
  4. Enter a Label (e.g., "My OpenAI Key").
  5. Enter your API Key.
  6. Click/tap Save Key.

Key Slot Features

Each key slot displays:

  • Label and Provider
  • Masked Key (last 4 characters visible)
  • Trust Score — Color-coded (green/yellow/red)
  • Statistics — Calls, successes, failures, rate limits

Actions per slot:

  • Test — Verify the key works
  • Pause / Activate — Temporarily disable or re-enable
  • Remove — Permanently delete

Trust System

LevelScoreColorBehavior
Excellent≥80%GreenPreferred
Good≥60%GreenNormal
OK≥40%YellowFallback
Weak≥20%YellowRarely used
Critical<20%RedLast resort

Keys that hit rate limits are placed in automatic cooldown while other keys are used.

12. AI Voice Chat

Talk to me includes two independent AI Voice Chat engines, each with its own strengths. You can switch between them at any time from the AI Chat tab.

EngineTechnologyKey Advantage
12a. Deepgram Voice AgentDeepgram Agent API (WebSocket)32+ presets, 6 LLM providers, 4 TTS providers, latency monitoring, managed & BYO modes
12b. Gemini 3.1 Flash LiveGoogle Gemini Live API (WebSocket)30 expressive voices, persona presets, thinking depth control, native Google multimodal AI

Full hands-free speaker mode (Android)

Both voice chat engines work completely hands-free through your phone speaker. Talk to me uses proprietary acoustic echo cancellation (AEC) via a native Android bridge to separate your voice from the AI's speaker output. Interrupt anytime — the AI stops immediately and continues from where you want. No headphones or extra equipment required. Desktop users with any standard setup work equally well.

12a. Deepgram Voice Agent

The Deepgram Voice Agent provides real-time, full-duplex AI voice conversations through a single WebSocket connection to the Deepgram Agent API. It orchestrates Speech-to-Text (STT), Large Language Models (LLMs), and Text-to-Speech (TTS) in one unified pipeline — you speak, the AI thinks, and responds with natural voice, all in real time.

Getting Started

  1. Switch to the AI Chat tab, then select the Deepgram sub-tab.
  2. Add a Deepgram API key in the Key Pool (scroll down to the “Deepgram Voice Agent” section).
  3. Choose a Configuration Preset or configure manually.
  4. Tap the green Start Conversation button.

Configuration Presets (32+ Options)

Talk to me ships with over 32 presets across six categories. Each preset pre-configures STT model, LLM provider/model, TTS provider/voice, and turn-detection parameters.

Top Tier — Best Quality

PresetLLMTTSSTT
Gemini 3.0 Pro + Sonic-3Google Gemini 3.0 ProCartesia Sonic-3Nova-3
Claude 4.5 + Sonic-3Anthropic Claude Sonnet 4.5Cartesia Sonic-3 (Tessa)Nova-3
Claude 4.6 + Sonic-3Anthropic Claude Sonnet 4.6Cartesia Sonic-3 (Katie)Nova-3
GPT-5.4 + Sonic-3OpenAI GPT-5.4Cartesia Sonic-3 (Katie)Nova-3
GPT-5.4 + KieferOpenAI GPT-5.4Cartesia Sonic-3 (Kiefer, Male)Nova-3

Ultra-Fast — Lowest Latency (~1.1s)

PresetLLMTTSSTT
GPT-4o Mini + Sonic-3OpenAI GPT-4o MiniCartesia Sonic-3Nova-3
GPT-5.4 Nano + Sonic-3OpenAI GPT-5.4 NanoCartesia Sonic-3Nova-3
Haiku 4.5 + Sonic-3Anthropic Claude Haiku 4.5Cartesia Sonic-3Nova-3
Gemini 2.5 Flash + Sonic-3Google Gemini 2.5 FlashCartesia Sonic-3Nova-3
Nemotron 49B + Sonic-3NVIDIA Nemotron Super 49BCartesia Sonic-3Nova-3

Flux — English Only, Ultra-Low Latency

Flux uses Deepgram's Flux STT model with eager end-of-turn detection for the absolute fastest response times. English only.

PresetLLMTTS
Flux + GPT-4o Mini + Sonic-3OpenAI GPT-4o MiniCartesia Sonic-3
Flux + GPT-5.4 Nano + Sonic-3OpenAI GPT-5.4 NanoCartesia Sonic-3
Flux + GPT-5.4 + Sonic-3OpenAI GPT-5.4Cartesia Sonic-3
Flux + Claude 4.6 + Sonic-3Anthropic Claude 4.6Cartesia Sonic-3
Flux + Gemini Flash + Sonic-3Google Gemini 2.5 FlashCartesia Sonic-3

Balanced — Quality + Speed

PresetLLMTTS
GPT-5 Mini + Sonic-3OpenAI GPT-5 MiniCartesia Sonic-3
GPT-4.1 Mini + Sonic-3OpenAI GPT-4.1 MiniCartesia Sonic-3
Haiku 4.5 + TessaAnthropic Haiku 4.5Cartesia Sonic-3 (Tessa)
Gemini 3.0 Flash + Sonic-3Google Gemini 3.0 FlashCartesia Sonic-3

Experimental — Deepgram Aura-2 TTS (Language-Specific)

PresetLLMTTS Voice
GPT-5.4 + Julius (DE)OpenAI GPT-5.4Aura-2 Julius (German, Male)
GPT-5.4 + Zeus (EN)OpenAI GPT-5.4Aura-2 Zeus (English, Male)
Claude 4.6 + Thalia (EN)Anthropic Claude 4.6Aura-2 Thalia (English, Female)
GPT-5.4 + Agathe (FR)OpenAI GPT-5.4Aura-2 Agathe (French, Female)
GPT-5.4 + Celeste (ES)OpenAI GPT-5.4Aura-2 Celeste (Spanish, Female)

Full BYO — Bring Your Own LLM & TTS Keys

In Full BYO mode, Deepgram handles only STT (Nova-3). Your own API keys for LLM and TTS providers are used directly.

PresetLLM (BYO Key)TTS (BYO Key)
GPT-5.4 + ElevenLabsOpenAI GPT-5.4ElevenLabs Turbo v2.5
GPT-5.4 + OpenAI TTSOpenAI GPT-5.4OpenAI TTS-1
GPT-5.4 Nano + ElevenLabsOpenAI GPT-5.4 NanoElevenLabs Turbo v2.5
Gemini 3 Pro + ElevenLabsGoogle Gemini 3 ProElevenLabs Turbo v2.5
Gemini Flash + OpenAI TTSGoogle Gemini 2.5 FlashOpenAI TTS-1
Claude 4.6 + ElevenLabsAnthropic Claude 4.6ElevenLabs Turbo v2.5
Claude 4.6 + OpenAI TTSAnthropic Claude 4.6OpenAI TTS-1
Grok 3 Mini + ElevenLabsxAI Grok 3 MiniElevenLabs Turbo v2.5

Preset Lock & Unlock

When a preset is active, all configuration fields are locked to the preset values (indicated by a lock icon). This prevents accidental changes. To override individual settings, tap Unlock for manual editing. Changing any setting manually switches the preset to “Manual Configuration”.

Manual Configuration

Tap the gear icon next to the Start button to open the configuration panel. All fields below are available:

LLM Provider

ProviderKey Models
OpenAIGPT-4o Mini, GPT-4.1 Nano/Mini/Full, GPT-5 Nano/Mini/Full, GPT-5.1–5.4 (incl. Nano, Mini)
AnthropicClaude Haiku 4.5, Sonnet 4, Sonnet 4.5, Sonnet 4.6
GoogleGemini 2.5 Flash/Flash Lite, Gemini 3.0 Flash/Pro, Gemini 3.1 Flash Lite
NVIDIALlama Nemotron Super 49B, Nemotron 3 Nano 30B
xAIGrok 3, Grok 3 Mini, Grok 3 Fast
GroqGPT OSS 20B

TTS Provider

ProviderVoicesLanguagesKey Required
Cartesia Sonic-39 voices (Katie, Kiefer, Tessa, Kyle, Leo, Jace, Gavin, Maya, Default)42 languages (multilingual auto-detect)Deepgram key only (managed)
Deepgram Aura-235+ voices (EN, DE, FR, ES, IT, NL, JA)Language-specific per voiceDeepgram key only (managed)
ElevenLabsYour ElevenLabs voices (auto-loaded)MultilingualElevenLabs API key (BYO)
OpenAI TTS10 voices (Alloy, Ash, Ballad, Coral, Echo, Fable, Nova, Onyx, Sage, Shimmer)EnglishOpenAI API key (BYO)

STT Model

ModelLanguagesUse Case
Nova-3MultilingualStandard, best overall accuracy
Nova-3 GeneralMultilingualGeneral-purpose variant
Nova-3 MedicalMultilingualMedical terminology optimized
FluxEnglish onlyUltra-low-latency turn detection

Other Settings

  • Language — Auto-Detect (Multilingual) or a specific language: English, German, French, Spanish, Italian, Dutch, Japanese, Portuguese, Hindi, Russian
  • Greeting Message — Text the agent speaks when the conversation starts (optional)
  • System Instruction — Define the AI’s personality and behavior. A base instruction is always included that prevents markdown formatting and follow-up questions in speech output.

Advanced Settings

Expand the Advanced section for fine-tuning:

  • Temperature (0.00 – 2.00) — Controls response creativity. Default: 0.7. Lower = more focused, higher = more creative.
  • STT Model — Switch between Nova-3 variants and Flux.

When Flux STT is selected, additional controls appear:

  • Eager EOT Threshold (0.0 – 1.0) — How aggressively the system detects end-of-turn. Higher = faster response but may cut you off mid-sentence.
  • EOT Timeout (0 – 5000ms) — Maximum silence before the agent responds.

For ElevenLabs BYO: A custom Voice ID field lets you enter any ElevenLabs voice ID directly.
For OpenAI TTS BYO: Select from 10 OpenAI voices (Alloy, Ash, Ballad, Coral, Echo, Fable, Nova, Onyx, Sage, Shimmer).

During a Conversation

  • Status indicator — Shows Ready, Connecting, Live (with elapsed time), or Error
  • Audio level meter — Displays microphone input with Listening/Silent state
  • Thinking indicator — A green badge appears while the LLM processes your input
  • Conversation transcript — Real-time display of all dialogue. Your messages appear on the right (green), the agent’s on the left (blue).
  • Barge-in — Interrupt the AI at any time by speaking. The agent stops immediately and listens to you.
  • Resize handle — Drag the handle below the transcript to resize the chat area (120px to 85% of screen)
  • Dual Start/Stop buttons — One at the top, one sticky at the bottom for easy access while scrolling

Latency Monitoring

A compact latency bar appears after the first turn, showing three key metrics:

  • LLM — Time from your speech to the first LLM token
  • TTFB — Total Time to First Byte (end-to-end)
  • TURN — Full turn duration including audio playback

Values are color-coded: green (< 2s), yellow (2–5s), red (> 5s).

Tap the latency bar to expand a detailed per-turn table with columns: #, Speech duration, LLM time, TTS time, TTFB, Audio length, Total. Average LLM and TTFB are displayed in the header.

Echo Cancellation (AEC)

Talk to me includes proprietary Acoustic Echo Cancellation via a native Android Kotlin bridge. The AI’s speaker output is captured and subtracted from your microphone input in real time, preventing self-triggering feedback loops. This allows full hands-free operation on speaker without headphones. Works on all managed presets and most BYO configurations.

Key Pool — Deepgram Voice Agent

The Deepgram Voice Agent Key Pool is a dedicated, collapsible section below the chat area. It manages:

  • Deepgram API Keys (required) — for STT and managed LLM/TTS routing
  • LLM Keys (optional, Full BYO only) — OpenAI, Anthropic, Gemini, xAI
  • TTS Keys (optional, Full BYO only) — ElevenLabs, OpenAI TTS

Each key card shows a 4-row layout: label, provider badge + masked key, trust score with statistics, and Test/Pause action buttons. You can test individual keys or all keys at once.

Session Limits

Sessions are limited to 15 minutes maximum (API constraint). The elapsed time is shown in the Stop button. The session ends automatically when the limit is reached.

Tips

  • Start with a managed preset (Top Tier or Ultra-Fast) — they require only a Deepgram key and offer the best experience.
  • GPT-5.4 Nano + Cartesia Sonic-3 delivers ~1.1s response times — the fastest option.
  • Flux presets are English-only but extremely fast due to eager end-of-turn detection.
  • Full BYO presets use your own LLM/TTS keys for maximum control but may have reduced barge-in performance with some TTS providers.
  • All settings take effect on the next session start, not during a live session.

12b. Gemini 3.1 Flash Live

Gemini 3.1 Flash Live provides real-time voice conversations powered by Google’s latest audio AI model. It delivers the speed and natural rhythm needed for voice-first interaction, with sub-second latency, 30 expressive voices, and native multimodal understanding.

Requirements

You need a Google Gemini API key (paid tier recommended) added to the LLM Key Pool in Settings. The key is automatically available for AI Voice Chat. The model used is gemini-3.1-flash-live-preview.

Starting a Conversation

Navigate to the AI Chat tab, then select the Gemini sub-tab. Tap Start Conversation. The app connects to Gemini via WebSocket, opens your microphone, and begins listening. Speak naturally — Gemini responds in real-time audio. Tap End to stop.

Voices (30 Options)

Choose from 30 natural AI voices, each with a distinct personality:

VoiceCharacterBest For
SulafatWarmStorytelling, bedtime stories, calm conversations
GacruxMatureAuthoritative narration, mentoring, deep discussions
AlgenibGravellyCinematic narration, dramatic reading, character voice
KoreFirmProfessional briefings, news reading, factual Q&A
PuckUpbeatEnergetic conversations, motivation, brainstorming
ZephyrBrightOptimistic chats, friendly assistance, greetings
CharonInformativeTutorials, documentary-style explanations
FenrirExcitableEnthusiastic reactions, game commentary, hype
LedaYouthfulCasual chat, Gen-Z conversations, trendy topics
AoedeBreezyRelaxed conversations, travel talk, lifestyle
AchernarSoftMeditation guidance, ASMR-style, gentle encouragement
AlgiebaSmoothPodcast hosting, audiobooks, long-form reading
DespinaSmoothElegant narration, luxury brand voice
AchirdFriendlyCustomer support, everyday assistance, welcoming tone
VindemiatrixGentleSupportive conversations, therapy-like tone, empathy
SadaltagerKnowledgeableTechnical explanations, expert Q&A, encyclopedic
RasalgethiInformativeScience documentaries, educational content
SchedarEvenBalanced discussions, neutral reporting, debates
AlnilamFirmCommanding presence, leadership, formal settings
PulcherrimaForwardAssertive communication, pitches, presentations
ZubenelgenubiCasualLaid-back chat, friends catching up, humor
SadachbiaLivelyAnimated storytelling, children’s content, playful
LaomedeiaUpbeatMorning shows, cheerful updates, positive vibes
CallirrhoeEasy-goingCasual advice, lifestyle coaching, approachable
AutonoeBrightCreative sessions, idea generation, art discussions
EnceladusBreathyIntimate narration, poetry reading, atmospheric
IapetusClearPrecise instructions, step-by-step guides, clarity
ErinomeClearClean communication, corporate training, diction
UmbrielEasy-goingRelaxed Q&A, weekend vibes, mellow conversations

Tip: Preview all voices in the Google AI Studio Voice Library.

Language

Select from 24 supported languages or leave on Auto-detect. Gemini responds in the language you speak — or in the language you select. Supported: English, German, French, Spanish, Italian, Portuguese, Dutch, Polish, Romanian, Russian, Ukrainian, Turkish, Arabic, Hindi, Bengali, Tamil, Telugu, Marathi, Japanese, Korean, Thai, Vietnamese, Indonesian.

Persona Presets

Persona presets define how Gemini behaves — its personality, tone, and communication style. Choose from six presets or create your own:

PresetBehavior
Friendly AssistantWarm, conversational, approachable — great for everyday use
ProfessionalClear, concise, authoritative — for business and work
EnthusiasticEnergetic, positive, encouraging — for brainstorming and motivation
Calm & SoothingSlow, gentle, patient — for relaxation and guided sessions
TeacherPatient, step-by-step, uses analogies — for learning and explanations
CreativeImaginative, expressive, vivid language — for storytelling and art
CustomWrite your own system instruction from scratch

System Instruction

The System Instruction is a text briefing you give to Gemini before the conversation starts. Think of it as directing an actor: tell the AI who it is, how to behave, and what to focus on.

Examples:

  • “You are a patient Italian language tutor. Speak slowly. Correct my grammar gently.”
  • “You are a senior software architect. Answer concisely and technically.”
  • “You are a creative storyteller. Speak with flair. Use vivid language.”

When using a Persona Preset, your custom text is appended to the preset instruction. In Custom mode, your text is the entire instruction. Write in English for best results. Settings are saved automatically.

Thinking Depth

Control how deeply Gemini reasons before responding:

LevelBehavior
MinimalFastest responses, minimal internal reasoning (default)
LowBrief consideration, good balance
MediumThoughtful responses, longer pause before answering
HighDeep reasoning, best for complex questions

Temperature & Top-P

Temperature (0.0 – 2.0) controls how creative vs. predictable the AI responds:

RangeBehaviorBest For
0.0 – 0.5Focused, deterministicFacts, technical answers, precise instructions
0.7 – 1.0Balanced, natural (default: 1.0)Most conversations, everyday use
1.2 – 2.0Creative, surprisingBrainstorming, storytelling, creative writing

Top-P (0.0 – 1.0) limits the pool of words the AI considers. At 0.95 (default), the model picks from the top 95% most likely words. Lower values make output more conservative.

Voice Activity Detection (VAD)

VAD settings control how Gemini detects when you start and stop speaking:

  • Speech Start Sensitivity — How easily the system detects speech onset.
  • Speech End Sensitivity — How quickly the system decides you’ve stopped talking.
  • Silence Duration — How many milliseconds of silence before your turn is considered complete (100–2000ms).

Echo Cancellation (AEC)

Identical to the Deepgram Voice Agent, Gemini 3.1 Flash Live benefits from Talk to me’s proprietary acoustic echo cancellation via the native Android Kotlin bridge. Full hands-free speaker mode works without headphones.

Tips for Best Results

  • Speak naturally — Gemini supports natural barge-in (interrupt anytime)
  • On Android, the built-in AEC eliminates echo — no headphones needed
  • Session length is limited to 15 minutes per connection (API limit)
  • All settings take effect on the next session start (not during a live session)
  • The audio level meter shows a colored gradient (green, yellow, orange, red) indicating your microphone input level
  • Transcription of your speech and Gemini’s responses can be toggled on/off independently

13. Mini-Player Windows

The Mini-Player is a compact Always-on-Top window that provides essential dictation controls without occupying your full screen.

Entering Mini-Player Mode

Click the Collapse button (↗ icon) in the header. The app window shrinks to a compact overlay positioned at the bottom center of your screen.

Mini-Player Layout

The Mini-Player displays a 3×3 grid of essential controls:

  • Row 1: Speech Input selector, Status/Start button, Text Output selector
  • Row 2: Voice Translate toggle, Inline Pill (spectrum analyzer), Save Recordings
  • Row 3: Pipeline timing TDFs, Result preview

DPI-Aware Sizing

The Mini-Player automatically adjusts its size based on your display's DPI scaling, ensuring consistent visual dimensions across monitors with different resolutions (100%, 125%, 150%).

Exiting Mini-Player Mode

Click the Expand button to return to the full-size window at its previous position and size.

14. Global Hotkeys Windows

Talk to me registers system-wide hotkeys so you can control dictation without switching to the app window.

Primary Hotkeys

HotkeyAction
Ctrl+WinStart / Stop Recording (global, works from any app)
Ctrl+Win (while processing)Cancel current pipeline

TTS Hotkey

When text is selected in any application, the TTS hotkey reads it aloud using your configured TTS provider.

Low-Level Hook

The global hotkey uses a Windows low-level keyboard hook, which means it works even when the app is minimized or another application has focus. The hook operates in "zero-swallow mode" — it intercepts the key combination without blocking other keyboard input.

15. Auto-Read Windows

Auto-Read is a Windows-exclusive feature that extracts text from the currently focused application and reads it aloud via TTS.

How It Works

  1. Enable Auto-Read by clicking the Auto-Read button.
  2. Select text in any application (or use Ctrl+C to copy).
  3. Talk to me detects the clipboard content and automatically reads it aloud using your TTS configuration.

Use Cases

  • Read emails, articles, or documents without staring at the screen.
  • Review your own writing by hearing it spoken back.
  • Accessibility support for vision-impaired users.

16. Notification Listener Windows

The Notification Listener captures Windows toast notifications and reads them aloud via TTS.

Requirements

  • Windows Desktop version
  • Notification access permission granted in Windows Settings

How It Works

  1. Enable Notification Listener by clicking the toggle.
  2. Grant notification access when prompted by Windows.
  3. When a Windows toast notification arrives (email, chat message, calendar reminder), Talk to me extracts the notification title and body, and reads it aloud using your TTS configuration.

Configuration

  • Enable/disable in Settings → Hands-Free
  • TTS voice and provider follow your global TTS settings

17. MP3 Recording & Save Windows

Record TTS Readings

When enabled, every TTS synthesis is automatically saved as an MP3 file with sequential numbering (e.g., recording_001.mp3, recording_002.mp3).

Save Recordings

Click Save Recordings to open the folder containing all recorded MP3 files. You can configure the recording directory in Settings.

A Note on Android Permissions Android

The Android version of Talk to me requires several system permissions (Microphone, Overlay, Accessibility Service, Notification Listener) — each with its own confirmation dialog. We understand that this can feel cumbersome.

We would have preferred a simpler setup experience. However, Google Play Store policies and Android security guidelines require that each sensitive permission is requested individually, with a clear disclosure explaining what the permission is used for and what it is not used for. These multi-step confirmation flows are not our design choice — they are mandated by platform compliance requirements.

Each permission is requested only when you actually need the feature, not all at once during installation. You can revoke any permission at any time through Android Settings. The app will continue to work — the corresponding feature will simply be disabled.

Here is a summary of all Android permissions and why they are needed:

PermissionFeatureRequired?
MicrophoneSpeech-to-Text dictation, AI Voice ChatYes — core feature
Draw over other appsFloating Bubble (hands-free overlay)Only if you use the overlay
Accessibility ServiceAuto-Paste text into chat app input fieldsOnly if you use Auto-Paste
Notification ListenerAuto-Read incoming messages aloudOnly if you use Auto-Read
InternetCommunication with AI providersYes — required for all features

Thank you for your understanding. We take your privacy seriously — none of these permissions are used to collect, store, or transmit personal data. See Privacy and Security for full details.

18. Floating Bubble (Overlay) Android

The Floating Bubble is a small circular icon that floats on top of all other apps, providing hands-free dictation access without switching apps.

Activating the Overlay

  1. Tap the Overlay button in the main app.
  2. If Android's "Draw over other apps" permission is not yet granted, you will be directed to enable it.
  3. A small Talk to me bubble appears on screen.

Using the Bubble

  • Single Tap: Start or stop recording. Red pulsing border during recording, blue pulsing border during TTS readout.
  • Triple Tap: Test readback — reads a predefined text to confirm TTS works.
  • Long Press: Clears the unread message queue.
  • Drag: Move the bubble anywhere on screen.

During Recording via Bubble

  1. Tap the bubble to start recording.
  2. After transcription, a "✓ Inserted!" toast confirms the text was pasted or placed in clipboard.

Bubble Translation and Auto-Insert

The Bubble uses the same translation logic as the main window: if your input and output languages differ, your dictation is automatically translated before being inserted. Voice Translate (text-to-speech readout) also works in the Bubble.

Using Android's Accessibility Service, the Bubble inserts the (possibly translated) text directly into the focused input field. In all mainstream apps we tested — including WhatsApp, Gmail, Discord, Microsoft Teams, Viber, Chrome, ChatGPT, Facebook, Instagram, Pinterest, and Skool — auto-insert works reliably.

If you use a very exotic app where auto-insert fails, the already translated text is guaranteed to be in the clipboard — a long press on the input field and "Paste" makes the text visible.

Stopping the Overlay

Tap the Overlay button again or tap Stop on the notification.

19. Auto-Paste Android

Auto-Paste uses Android's Accessibility Service to automatically insert dictated text into the currently focused text field.

Enabling Auto-Paste

  1. Tap the Auto-Paste button.
  2. A disclosure dialog explains what the Accessibility Service does and does not do. Tap Enable Auto-Paste.
  3. You are directed to Android's Accessibility Settings. Find Talk to me and enable it.
  4. The button now shows ✓ with a cyan border.

Accessibility Shortcut Button

When enabling the Accessibility Service, Android will ask you to choose an activation shortcut. This determines how you can quickly toggle the service on/off:

  • Accessibility button (recommended): A small button appears in the navigation bar. Tap it to toggle the service.
  • Volume Up + Volume Down (hold 3 seconds): Press and hold both volume keys simultaneously for 3 seconds to toggle.

We recommend the Accessibility button option for the easiest experience. This is a standard Android system feature — the choice does not affect how Auto-Paste works.

Important Notes

  • Requires Android Accessibility permission (a sensitive permission).
  • May need to be re-granted after app updates.
  • Used exclusively for text insertion — no other accessibility data is accessed.

App Compatibility

Auto-Paste works reliably in most Android apps. The following apps were tested with v0.5.159:

AppAuto-PasteTranslation
WhatsApp
Gmail (recipient + body)
Discord
Microsoft Teams
Viber
Chrome
ChatGPT
Facebook
Instagram
Pinterest
Skool (WebView in Chrome)
Viber

"App Access Denied" — Restricted Settings (Android 13+)

On some devices, when enabling Auto-Paste or Notification Access, you may see "App access denied" or "For your security, this setting is currently unavailable." This is not a bug — it is an Android 13+ security feature called Restricted Settings.

Affected manufacturers: Lenovo (ZUI), Samsung (One UI), Xiaomi/Redmi (MIUI/HyperOS), OPPO/Realme (ColorOS), Huawei/Honor (EMUI/HarmonyOS), OnePlus (OxygenOS), Stock Android/Pixel.

How to fix:

  1. Open Android Settings → Apps → See all apps → find Talk to me.
  2. Tap Talk to me to open the App Info page (not the Notifications sub-page).
  3. Tap the three-dot menu (⋮) in the top-right corner.
  4. Select Allow restricted settings.
  5. Confirm with your PIN/fingerprint.
  6. Go back to Settings → Accessibility and enable Talk to me.

Tip: If the three-dot menu is not visible, first try to enable the permission (triggering the error), then go to the App Info page — the menu should now appear.

Xiaomi/MIUI/HyperOS: Go to Settings → Apps → Manage apps → Talk to me and scroll to the bottom.

Lenovo (ZUI): When tapping Apps in Settings, you may land on the Notifications sub-page instead of App Info. Navigate back and look for the full App Info page with storage, permissions, and battery sections.

20. Auto-Read Messages Android

Auto-Read automatically reads incoming chat messages aloud using TTS — ideal for driving, cooking, or exercising.

How It Works

  1. Enable Auto-Read (Headphones icon).
  2. Ensure Notification Access is granted.
  3. The Overlay must be active.
  4. When a message arrives from an allowed app, Talk to me announces the sender and reads the message aloud.

Pre-Selected Chat Apps

WhatsApp, WhatsApp Business, Telegram, Signal, Discord, Slack, Microsoft Teams, Viber, Messenger (Meta), Instagram, Google Messages, Samsung Messages.

You can add or remove apps in Auto-Read Apps Configuration.

21. Notification Access Android

Notification Access allows Talk to me to read incoming notifications, required for Auto-Read Messages.

Granting Access

  1. Tap the Notif Access button.
  2. Go to Android's Notification Listener Settings.
  3. Find Talk to me and enable it.
  4. The button shows ✓ with a cyan border.

Important Notes

  • System-level permission — processes only notifications from explicitly allowed apps.
  • No notification data is stored, transmitted, or logged.

22. Auto-Read Apps Configuration Android

Control which apps are allowed to have their notifications read aloud.

Known Chat Apps

Pre-selected messaging apps with individual toggles (WhatsApp, Telegram, Signal, Discord, Slack, Teams, Viber, Messenger, Instagram, Google Messages, Samsung Messages).

Search and Add Custom Apps

  1. Tap the search field and type an app name.
  2. Matching installed apps appear, sorted by relevance.
  3. Check the box to add an app.

How Filtering Works

  • Only notifications from allowed apps are read aloud.
  • Changes take effect immediately — no restart required.

23. Settings

UI Language

English, Deutsch, Français, Español — independent of your system language.

Quality Preset

PresetSTT ProviderLLM ProviderModelPolish
Top PerformerScribe v2OpenAIGPT-5.4Strong
StandardScribe v2OpenAIGPT-4.1 miniStrong
BudgetWhisperGroqDefaultLight
FreeDeepgramGroqDefaultOff
CustomManualManualManualManual

Speech-to-Text

  • Provider: OpenAI Whisper, Deepgram Nova-2/3, ElevenLabs Scribe v2, Groq Whisper
  • Custom Keyterms (Scribe only): Proper nouns, brands, technical terms
  • Language: Auto-Detect or specific

Text-to-Speech

  • Provider: ElevenLabs, OpenAI TTS, Deepgram Aura 2
  • Model (ElevenLabs): Eleven v3, Multilingual v2, Flash v2.5, Turbo v2.5

LLM Provider (Polish)

  • Provider: OpenAI, Groq, Anthropic, Google Gemini, xAI Grok
  • Model: Provider default or specific
  • Polish Strength: Light or Strong

Translation Provider

Separate provider for AI-Translation (can differ from Polish provider).

AI-Polish / AI-Translate

Toggle each independently. When AI-Translate is enabled:

  • Translate To: 20 target languages
  • Voice Translate: Auto-read translations via TTS

Android Hands-Free

Quick toggles for Overlay, Auto-Read Messages, Auto-Paste, Notification Access.

Save and Test

  • Save all current settings — Persists changes to device storage
  • Test current configuration — Tests all configured providers with response times

24. Word Corrections

Word Corrections teach Talk to me the correct spelling of names, brands, and terms that speech recognition gets wrong.

Adding Corrections

Single Add

Enter Wrong spelling and Correct spelling, then tap/click Add.

Bulk Import

Enter the correct spelling, then list wrong variants (one per line). Use Generate with AI to auto-create likely misspellings.

Multi-Import

Enter pairs as wrong;correct (one per line). Supports ;, ->, comma, or tab separators.

How Corrections Work

During post-processing (Pipeline stage 3), wrong spellings are automatically replaced before AI-Polish runs.

25. Backup and Restore

Export Settings

  1. Open Backup & Restore in Settings.
  2. Tap/click Export Settings.
  3. Enter and confirm an Encryption Password (min. 6 characters).
  4. Windows: The save dialog suggests talktome-settings.ttm — you choose the folder.
  5. Android: The backup is written to your Downloads area as TalkToMe-backup.ttm. If that name already exists, the system may add (1), (2), etc. — all are valid encrypted backups.

Import Settings

  1. Tap/click Import Settings.
  2. Automatic (Android): The app looks for the newest matching file named TalkToMe-backup with a .ttm extension (including TalkToMe-backup (1).ttm, etc.) in app storage and in Downloads.
  3. If the system file picker opens: On many phones (e.g. Samsung), the first screen is Recently used and may default to Images — your .ttm files are hidden until you switch the top filter to Documents or This week, or open the Download folder directly.
  4. New device: Copy the .ttm from your old device (USB, cloud, email), then use Import and pick that file.
  5. Enter the encryption password.
  6. All settings are restored and the app restarts.

Technical Details

  • Encryption: AES-256-GCM with PBKDF2-HMAC-SHA256 (100,000 iterations)
  • Included: All settings, API keys, word corrections, auto-read apps, quality preset, UI language
  • NOT included: License activation (tied to Machine ID)

26. Usage Dashboard

MetricDescription
STT CallsSpeech-to-text transcriptions performed
LLM PolishAI-Polish or AI-Translate operations
TTS SynthText-to-speech synthesis operations

Counters are cumulative since the last settings reset.

27. Troubleshooting

General

ProblemSolution
"No API key configured"Add a key in Key Pool for the feature you need
Recording doesn't startCheck microphone permission in system settings
Voice Translate produces no audioEnsure a TTS API key is configured and working
Export failsCheck write access to Downloads folder
Can't see backup in Import file pickerSwitch from Images to Documents / This week, or open the Download folder — see §25 Import

Windows Windows-Specific

ProblemSolution
Ctrl+Win hotkey doesn't workEnsure the app is running (check system tray)
Text not pasted after dictationEnsure the target window supports Ctrl+V
Notification Listener unavailableAvailable on Windows Desktop — ensure notification access is granted in Windows Settings
Mini-Player looks too large/smallDPI-aware sizing adjusts automatically; restart the app if display settings changed

Android Android-Specific

ProblemSolution
Auto-Read doesn't workEnsure Overlay is active, Auto-Read enabled, and Notification Access granted
Auto-Paste doesn't workRe-enable Accessibility Service in Android Settings
Bubble doesn't appearGrant "Draw over other apps" permission
"App access denied" when granting permissionsRestricted Settings (Android 13+) — see §19 "Restricted Settings" for the step-by-step solution
Screen doesn't rotate (Tablet)Check if PC Mode is active (pull down Quick Settings). Auto-Rotate is ignored in PC Mode — switch back to Android Mode. Primarily affects Lenovo tablets (ZUI).

28. Privacy and Security

Data Handling

  • No data collection: Talk to me does not collect, store, or transmit any user data to mrocon GmbH servers.
  • Direct API communication: Audio and text go directly from your device to your chosen AI provider.
  • Local storage only: All settings and API keys are stored exclusively on your device.
  • No analytics: No tracking, analytics, or telemetry of any kind.

Permissions

Windows

PermissionPurpose
MicrophoneRecord audio for dictation
Notification AccessRead notifications
InternetCommunicate with AI providers

Android

PermissionPurpose
MicrophoneRecord audio for dictation
Overlay (Draw over apps)Display the floating bubble
Notification ListenerRead notifications for Auto-Read
Accessibility ServiceAuto-Paste text into fields
InternetCommunicate with AI providers
Query Installed PackagesShow app names in Auto-Read settings

Encryption

  • Windows: API keys encrypted with DPAPI (Windows Data Protection API)
  • Android: API keys in app-private internal storage
  • Backup files: AES-256-GCM encryption

Appendix A — Supported Languages

Speech Input Languages

Auto-Detect, German, English, French, Spanish, Italian, Portuguese, Dutch, Japanese, Korean, Chinese, Russian, Arabic, Hindi, Polish, Turkish, Swedish, Ukrainian

Translation Target Languages

German, English, French, Spanish, Italian, Portuguese, Dutch, Japanese, Chinese, Korean, Russian, Arabic, Hindi, Polish, Turkish, Swedish, Ukrainian, Danish, Finnish, Norwegian

TTS Languages

Auto, German, English, French, Italian, Spanish, Portuguese, Dutch, Polish, Swedish, Danish, Finnish, Norwegian, Turkish, Japanese, Korean, Chinese

UI Languages

English, Deutsch, Français, Español

Appendix B — Supported Providers

Speech-to-Text

ProviderNotes
OpenAI WhisperMost widely used, reliable
Deepgram Nova-2 / Nova-3Fast, good accuracy
ElevenLabs Scribe v2Supports custom keyterms
Groq WhisperFree tier available, fast

LLM (Polish / Translation)

ProviderNotes
OpenAIGPT-4o-mini, GPT-5.4, etc.
GroqFree tier, Llama models
AnthropicClaude models
Google GeminiGemini models
xAI GrokFree tier available

Text-to-Speech

ProviderNotes
ElevenLabsBest quality, voice cloning, 4 models
OpenAI TTS6 built-in voices, simple
Deepgram Aura 2Fast synthesis

Appendix C — Quality Presets

Preset STT LLM Model Polish Cost
Top PerformerScribe v2OpenAIGPT-5.4Strong$$$
StandardScribe v2OpenAIGPT-4.1 miniStrong$$
BudgetWhisperGroqDefaultLight$
FreeDeepgramGroqDefaultOffFree
CustomManualManualManualManualVaries

Appendix D — Keyboard Shortcuts Windows

ShortcutAction
Ctrl+WinStart / Stop Recording
Ctrl+Win (during processing)Cancel Pipeline
TTS HotkeyRead selected text aloud

Talk to me is a product of mrocon GmbH. All rights reserved.

For support, contact team@talktome.studio or visit talktome.studio.

↑ Back to top