Talk to me — User Manual
This manual covers both the Windows Desktop and Android Hands-Free editions of Talk to me. Sections marked with Windows or Android apply only to that platform. All other sections apply to both.
1. Introduction
Talk to me is a professional dictation, translation, and voice interaction studio available for Windows Desktop and Android. It converts your speech into text, polishes it with AI, translates it into 20+ languages, and reads it back to you — all in real time.
The app follows a strict BYOK (Bring Your Own Key) and Zero-Knowledge / Zero-Trust architecture: your API keys and data never leave your device.
Key Features
- Real-time Dictation: Record your voice and get polished text in seconds.
- AI-Polish: Automatic grammar correction and filler word removal powered by your choice of AI provider.
- Live Translation: Translate dictated text into 20+ languages on the fly.
- Voice Translate (Speech-to-Speech): Your translated text is automatically read aloud in the target language.
- Text-to-Speech: Convert any text into natural-sounding speech with ElevenLabs, OpenAI TTS, or Deepgram.
- Live Language Immersion: Speak in your native language, instantly see and hear it in the language you want to master.
- Word Corrections: Teach the app your names, brands, and terms that speech recognition gets wrong.
- Encrypted Backup: Export all settings and API keys as a password-protected encrypted file.
- Multi-Provider Support: Choose from OpenAI, Groq, Anthropic, Google Gemini, xAI Grok, ElevenLabs, Deepgram, and more.
Platform Highlights
| Feature | Windows Desktop | Android Hands-Free |
|---|---|---|
| Mini-Player (compact mode) | ✓ | — |
| Global Hotkeys (Ctrl+Win) | ✓ | — |
| Auto-Read (Ctrl+C text extraction) | ✓ | — |
| Notification Listener | ✓ | — |
| MP3 Recording & Save | ✓ | — |
| Floating Pill (Spectrum Analyzer) | ✓ | — |
| Floating Bubble (Overlay) | — | ✓ |
| Auto-Paste (Accessibility) | — | ✓ |
| Auto-Read Messages (from chat apps) | — | ✓ |
| App-level Notification Access | — | ✓ |
Security Principles
- Zero-Knowledge: Talk to me never stores, transmits, or has access to your API keys on any server. All keys are stored locally on your device.
- Zero-Trust: The app never phones home. No analytics, no tracking, no telemetry. Your dictation data flows directly from your device to your chosen AI provider and nowhere else.
- BYOK: You bring your own API keys from the providers you trust. Talk to me does not resell API access.
2. Getting Started
Windows Installation — Windows Desktop
Talk to me for Windows is available as an EV-signed installer from talktome.studio or via the Microsoft Store.
System Requirements:
- Windows 10 or later (64-bit)
- An active internet connection
- At least one API key from a supported provider
The installer is digitally signed with an Extended Validation (EV) certificate from Certum (mrocon GmbH). Windows SmartScreen will not show any warnings.
Android Installation — Android
Talk to me for Android is available as an APK from talktome.studio or via the Google Play Store.
System Requirements:
- Android 8.0 or later
- An active internet connection
- At least one API key from a supported provider
First Launch
When you open Talk to me for the first time, you will see the License Gate. You have two options:
- Enter a License Key to unlock the full app immediately.
- Start a 7-Day Free Trial to explore all features without a license key.
After activation or trial start, the app loads and you can begin using it right away — provided you have at least one API key configured (see Key Pool).
Android Quick Start — Your First 5 Minutes
After activating your license (or starting the free trial), the app opens and you will see the main screen — the Cockpit. Don't worry if most buttons appear orange or inactive. That's completely normal! Here is what to do, step by step:
Step 1 — Enable Microphone Access
The large button in the center of the screen reads "Enable Microphone Access". This is the first and most important step.
- Tap the Enable Microphone Access button.
- A dialog from Talk to me explains why the microphone is needed. Tap OK.
- Android then asks: "Allow Talk to me to record audio?" — tap While using the app (or Allow).
- Done! The button changes to "Ready — Start Dictation" in green. You can now record your first dictation.
Step 2 — Add Your API Keys
At the bottom of the screen you will see the Key Pool bar — probably showing red labels like STT 0/5, LLM 0/5, TTS 0/5. This means no API keys are configured yet. Without keys, the app cannot connect to AI services.
- Tap any of the Key Pool labels (e.g. STT) to open the Key Pool section.
- Tap Add Key and paste an API key from your provider (e.g. OpenAI, Deepgram, ElevenLabs).
- Tap Save. The label turns green when a valid key is stored.
- Repeat for each category you want to use. At minimum, you need an STT key (for dictation). For AI polish, add an LLM key. For text-to-speech, add a TTS key.
See §11 Key Pool for a detailed guide on supported providers and how to obtain API keys.
Step 3 — Optional Features (Cockpit Buttons)
The buttons in the center of the Cockpit control optional features. Each one requires a system permission the first time you enable it. You will see a short explanation dialog from Talk to me, followed by the Android system dialog. Both are normal and safe to confirm.
| Button | What it does | Details |
|---|---|---|
| Auto-Paste | Automatically pastes your dictated text into whichever app you were using (e.g. WhatsApp, email). No manual copy-paste needed. | §19 |
| Notif Access | Lets the app read incoming notifications so it can auto-read messages to you. | §21 |
| Auto-Read | Reads incoming messages aloud using text-to-speech — great for hands-free use while driving or cooking. | §20 |
| Overlay | Shows a small floating bubble on your screen. Tap it to start/stop dictation from any app — without switching back to Talk to me. | §18 |
You don't need all of these right away. Start with dictation (Step 1 + 2), and enable the extras whenever you're ready. Each feature can be turned on or off at any time.
Free & Paid Tier Overview
Talk to me is a BYOK app (Bring Your Own Key). You use your own API keys from AI providers. Many providers offer generous free tiers — from $200 Deepgram credit to unlimited Gemini usage to free Grok and Groq keys. This means you can use Talk to me for months before any API costs arise.
Tier 1 — Completely Free (no money, no credit card)
| What you need | What you get | How to get it |
|---|---|---|
| 1× Deepgram account (free) | Speech-to-Text dictation (STT) | deepgram.com → Sign up → $200 starter credit |
| 1× Gemini API key (free) | AI Voice Chat (Gemini Live) | aistudio.google.com → Create API Key |
What you can do:
- Dictate with Deepgram Nova-3 (preset “Free”) — no LLM polish, but solid transcription
- AI Voice Chat via the Gemini Live tab — real-time voice conversation with sub-second latency, 30 voices, 24 languages
How long does it last?
| Feature | Credit / Limit | Lasts for |
|---|---|---|
| Deepgram STT | $200 starter credit (never expires) | ~43,000 min (~716 hours) transcription |
| Gemini Live Voice Chat | Free API key (no credit limit) | Unlimited (rate limit: ~10 sessions/min) |
| Gemini LLM (for Polish) | Free API key | 250 requests/day (Flash model) |
Reality: With these two free accounts you can use Talk to me productively for months. During intensive daily testing, only $19 of $200 Deepgram credit was used after weeks.
Tier 2 — Free with More Power (additional free keys)
| What you need | What it adds | Cost |
|---|---|---|
| + 1× xAI account | Grok-3-Mini as LLM for Polish + Translation | Free ($25 starter credit + up to $150/month with data sharing) |
| + 1× Groq account | Ultra-fast LLM for Polish (Llama models) | Free (1,000 requests/day, no credit card) |
Unlocked presets:
| Preset | STT | LLM / Polish | All keys free? |
|---|---|---|---|
| Free | Deepgram Nova-3 | — | Yes (1 key) |
| Free xAI | Deepgram Nova-3 | xAI Grok | Yes (2 keys) |
| Free Gemini | Deepgram Nova-3 | Google Gemini | Yes (2 keys) |
| Fast Free | OpenAI Whisper | Groq Llama | Yes (2 keys) |
| Economy | Deepgram Nova-3 | Groq Llama | Yes (2 keys) |
| Economy Plus | Deepgram Nova-3 | Groq Llama (Strong Polish) | Yes (2 keys) |
Also unlocked:
- Deepgram Voice Agent with 20+ managed presets (uses your $200 credit, $0.05–0.16/min)
- Full BYO Voice Agent Presets (e.g. GPT-5.4 + ElevenLabs, if you have the keys)
Tier 3 — Premium Quality (paid keys)
For the absolute best quality, you need paid API keys:
| Provider | Used for | Cost | What you get |
|---|---|---|---|
| OpenAI | GPT-5.4 (best LLM for Polish) | Pay-per-use (~$5–15/month) | Perfect grammar, style, translation |
| ElevenLabs | Scribe v2 (best STT) + TTS | From $5/month (Starter) | Best transcription, premium voices |
| Anthropic | Claude 4.6 Sonnet (top LLM) | Pay-per-use | Excellent text quality for longer texts |
API Key Cost Overview
| Provider | Sign up | Starter credit | Ongoing cost | Credit card? |
|---|---|---|---|---|
| Deepgram | Free | $200 (never expires!) | From $0.0043/min STT | No |
| Google Gemini | Free | Unlimited (rate-limited) | $0.005–0.018/min (Live Audio) | No |
| xAI (Grok) | Free | $25 + up to $150/month | From $0.10/1M tokens | No |
| Groq | Free | Unlimited (rate-limited) | 1,000 requests/day free | No |
| OpenAI | Free | $5 (expires after 3 months) | From $0.15/1M tokens | Yes (for GPT-5+) |
| Anthropic | Free | $5 (expires after 30 days) | From $1.00/1M tokens | Yes |
| ElevenLabs | Free | 10,000 chars/month | From $5/month (Starter) | Yes |
Recommended Start (3 minutes, $0 cost)
- Create Deepgram account → deepgram.com → Sign up → Copy API Key
- Create Gemini API key → aistudio.google.com → “Create API Key” → Copy key
- Enter keys in Talk to me → Settings → LLM Key Pool
- Go: Dictation tab → preset “Free Gemini” → Dictate with STT + AI Polish. Gemini Live tab → “Start Conversation” → Real-time voice chat with AI.
Optional for even more:
- xAI account → x.ai/api → Sign up → API Key → Enter in Key Pool → preset “Free xAI”
- Groq account → console.groq.com → Sign up → API Key → presets “Economy” / “Economy Plus” / “Fast Free”
Feature Availability by Tier
| Feature | Tier 1 (free) | Tier 2 (free+) | Tier 3 (premium) |
|---|---|---|---|
| Speech dictation (STT) | ✓ Deepgram | ✓ Deepgram + Whisper | ✓ + ElevenLabs Scribe v2 |
| AI Polish (grammar) | — | ✓ Grok/Gemini/Groq | ✓ + GPT-5.4 / Claude 4.6 |
| Real-time translation | — | ✓ (all LLM providers) | ✓ (best quality) |
| Gemini Live Voice Chat | ✓ (unlimited) | ✓ (unlimited) | ✓ (unlimited) |
| Deepgram Voice Agent | — | ✓ (from $200 credit) | ✓ (all presets) |
| BYO Voice Agent Presets | — | ✓ (with xAI/Groq keys) | ✓ (+ ElevenLabs/OpenAI TTS) |
| Available presets | 2 | 6+ dictation + 20+ Voice Agent | All (30+) |
All prices and free tier conditions are set by the respective providers and may change. Last updated: April 2026.
3. License Activation
The License Gate
On first launch (or after trial expiration), the License Gate is displayed. It shows:
- The Talk to me wordmark
- A text field for your license key (format:
TTM-XXXX-XXXX-XXXX-XXXX) - Your Machine ID (a unique device identifier, needed for activation)
- An Activate button
- A Start 7-Day Free Trial button (if no trial has been used)
- Links to Buy a License and the Customer Portal
Activating a License
- Enter your license key in the text field.
- Tap/click Activate.
- The app verifies your key online and activates it for this device.
- Once activated, you will not see the License Gate again unless you deactivate or your license expires.
The Free Trial
- Tap/click Start 7-Day Free Trial to unlock all features for 7 days.
- A banner at the top of the app shows how many trial days remain.
- After 7 days, the trial expires and the License Gate reappears.
License Modal
Once inside the app, you can view your license status by clicking the License button (shield icon). The License Modal shows:
- Status: Active, Trial, Grace Period, or Expired
- Product: Your license product name
- Plan: Yearly or Lifetime
- Expires: Expiration date (or "Lifetime")
- Devices: Number of active devices / maximum allowed
- Key: Your license key (partially masked)
- Machine ID: Your device's unique identifier
From this modal you can:
- Deactivate Device — releases the license from this device so you can use it on another
- Close — return to the app
4. App Overview
The app is organized into three main tabs and several supporting sections:
Navigation
At the top of the screen, three tabs let you switch between the app's primary modes:
- Speech-to-Text — Record your voice and get polished, translated text
- Text-to-Speech — Convert written text into spoken audio
- AI Voice Chat — Have real-time voice conversations with AI (see §12)
Interface Layout
Below the tabs, the main interface is arranged vertically:
- Quick-Override Controls — Language selectors for input and output
- Action Buttons — Quick access to platform features
- Status Indicator — Shows the current state (Ready, Recording, Transcribing, etc.)
- Pipeline Display — Visual progress of your dictation through the processing stages
- Result Area — Your transcribed/translated text
- TTS Panel (Text-to-Speech tab only) — Text input and playback controls
- AI Voice Chat Panel (AI Voice Chat tab only) — Voice/persona selection, conversation controls, live transcript (see §12)
- Key Pool — Manage your API keys
- Settings — All configuration options
Action Buttons
Windows Desktop action buttons:
- Voice Translate — Toggle speech-to-speech translation
- Notification Listener — Toggle notification readout
- Auto-Read — Toggle Ctrl+C text-to-speech
- Record TTS Readings — Toggle MP3 recording of TTS output
- Save Recordings — Open recordings folder
Android action buttons:
- License — Open license modal
- Voice Translate — Toggle speech-to-speech translation
- Overlay — Start/stop the Floating Bubble
- Auto-Paste — Open Accessibility settings
- Auto-Read — Toggle auto-read messages
- Notif Access — Open notification listener settings
The Info Button
In the header, the Info button opens the App Info modal, which displays:
- A link to talktome.studio
- The support email (tap/click to copy)
- The current app version
- Number of detected microphones
5. Speech-to-Text
The Speech-to-Text tab is the primary mode of Talk to me. Here, you record your voice and receive polished, optionally translated text.
Recording a Dictation
- Ensure the status shows Ready — Start Dictation (green).
- Click/tap the large Start Dictation button.
- The button turns red and shows Stop Recording. Speak clearly.
- While recording, you can see: Recording duration in seconds, Audio level meter showing input volume, the currently active STT provider and language.
- Click/tap the button again to Stop Recording.
Windows You can also start/stop recording using the global hotkey Ctrl+Win (no need to focus the app window).
What Happens After Recording
After you stop recording, the app processes your audio through the Pipeline (see The Pipeline):
- Capture — Audio recording is finalized
- STT — Your audio is transcribed by the selected provider
- Post-Processing — The raw text is cleaned up (word corrections applied)
- Polish / Translation — If enabled, AI corrects grammar or translates the text
- Inject — The final text is placed in your clipboard
Windows The text is automatically pasted into the previously focused window via simulated Ctrl+V (Smart Clipboard Injection).
Android If Auto-Paste is enabled, the text is automatically inserted into the active text field via the Accessibility Service.
The Result Area
After processing, your text appears in the result area. A hint confirms the text has been copied to your clipboard and is ready to paste.
Recording Signals (Audio Cues)
Talk to me signals you acoustically and visually when the microphone is actually recording — so no words are lost.
Acoustic Signals
- Start beep (short high blip): "Microphone is live, you can speak now."
- Stop beep (short low blip): "Recording ended."
Both beeps can be toggled on/off in the settings and their volume can be adjusted (default: 100%).
Visual Signals
- Idle/Standby: Microphone icon is orange — recording inactive.
- Recording active: Microphone icon is green — every spoken word is being captured.
Note: Start Beep on Speakerphones
Some audio devices suppress the start beep. This is not a bug but a hardware characteristic:
| Device Type | Beep Audible? | Recommendation |
|---|---|---|
| Speakers + separate microphone | ✅ Yes | — |
| Headset with separate mic + speaker | ✅ Yes | — |
| USB speakerphone (Jabra Speak2, Logitech P710e etc.) | ⚠️ Possibly not | Use headset or external speakers |
| Bluetooth headset in Hands-Free profile | ⚠️ Possibly not | Wired headset as alternative |
Important: If you change the default audio device, restart Talk to me so the beep plays on the new device.
6. Text-to-Speech
The Text-to-Speech tab lets you convert any written text into natural-sounding speech.
Basic Usage
- Switch to the Text-to-Speech tab.
- Type or paste text into the text area.
- Click/tap Read Aloud to start playback.
Playback Controls
- Pause — Temporarily stops playback
- Resume — Continues from where you paused
- Stop — Ends playback entirely
- Replay — Plays the same audio again without re-synthesizing
Provider and Voice Selection
- ElevenLabs: Choose from your available voices or use "Default (Brian v3)". Custom Voice-IDs supported.
- OpenAI TTS: Nova, Alloy, Echo, Fable, Onyx, Shimmer
- Deepgram Aura 2: Fast synthesis
Model Selection (ElevenLabs)
| Model | Character Limit | Best For |
|---|---|---|
| Eleven v3 | 5,000 | Highest quality, short content |
| Multilingual v2 | 10,000 | Multi-language support |
| Flash v2.5 | 40,000 | Fast synthesis, long texts |
| Turbo v2.5 | 40,000 | Speed and quality balance |
Audio Quality
| Quality | Description |
|---|---|
| MP3 192 kbps | Creator quality — highest fidelity |
| MP3 128 kbps | Standard — good balance |
| MP3 64 kbps | Compact — smaller file size |
| MP3 32 kbps | Minimal — lowest quality |
Text Normalization
| Setting | Description |
|---|---|
| Auto | The model decides how to handle numbers |
| Always On | Numbers converted to words (e.g., "42" → "forty-two") |
| Off | No normalization applied |
Voice Fine-Tuning (ElevenLabs)
| Slider | Range | Description |
|---|---|---|
| Stability | Variable ↔ Stable | Lower = more expressive; Higher = more consistent |
| Similarity | Creative ↔ Original | How closely the output matches the original voice |
| Style | Neutral ↔ Expressive | Amount of emotional expression |
| Speed | Slow (0.7×) ↔ Fast (1.2×) | Playback speed |
Additional Options
- Code-Filter: Strips code blocks and technical syntax before synthesis.
- Auto-Record: Automatically saves synthesized audio. Tap the folder icon to choose the directory.
- Speaker Boost: Enhances voice clarity (ElevenLabs only).
7. The Pipeline
The Pipeline is Talk to me's core processing engine. It visualizes the stages your audio passes through from recording to final output.
Pipeline Stages
| Stage | Label | Description |
|---|---|---|
| 1 | Capture | Audio recording and finalization |
| 2 | STT | Speech-to-Text transcription |
| 3 | Post | Post-processing (cleanup, word corrections) |
| 4 | Polish or Trans | AI-Polish or AI-Translate |
| 5 | Inject | Text copied to clipboard / auto-pasted |
TDF (Text Display Field) Indicators
Each pipeline stage shows the active provider (e.g., "Scribe v2", "GPT-5.4") and timing information after completion.
Timing Display
After processing, a timing line shows:
STT 1.2s → LLM 0.8s → Inject 0.1s → Total 2.1s
If Voice Translate is active, an additional S2S (Speech-to-Speech) timing is shown.
8. Voice Translate
Voice Translate combines AI-Translation with Text-to-Speech to create a real-time speech-to-speech translation experience.
New since v0.5.150: Text translation is now automatically active whenever your input language (Speech Input) and output language (Text Output) differ. You no longer need a separate switch for text translation. The Voice Translate button now only controls whether the final text is read aloud (text-to-speech output).
How It Works
- Enable Voice Translate (purple when active).
- Record a dictation in your source language.
- The app transcribes → translates → reads the translation aloud.
Examples
- DE → EN without Voice Translate: You speak German, receive English text — no audio output.
- DE → EN with Voice Translate: You speak German, receive English text — and it is read aloud.
- DE → DE with Voice Translate: Same language, no translation — but the text is read aloud.
Configuration
- Target Language: Set in Settings → AI-Translate → Translate To
- TTS Voice: Uses your configured TTS provider and voice
Use Cases
- Travel: Speak in your language, have the translation read aloud.
- Language Learning: Hear how your text sounds in another language.
- Live Language Immersion: Turn your own thoughts into live fluency — speak in your native language and absorb the output in the language you want to master.
9. AI Polish & Translation
AI-Polish
When enabled, AI-Polish corrects grammar, punctuation, and (with "Strong" setting) removes filler words like "um", "uh", "you know", "basically".
Polish Strength:
- Light — Grammar and punctuation correction only
- Strong — Also removes filler words
Status indicators:
- POLISH (cyan) — Active
- OFF — Disabled
- KEY MISSING (yellow) — No LLM key configured
AI-Translate
When enabled, your dictated text is translated into the target language.
Status indicators:
- TRANSLATE (cyan) — Active, showing target language
- VOICE OUTPUT (purple) — Voice Translate also active
- TEXT ONLY — Translation without voice output
- OFF — Disabled
Note: Since v0.5.150, Talk to me automatically detects when input and output languages differ and activates translation — without an explicit toggle. AI Polish remains independently available and is no longer automatically disabled.
10. Quick-Override Controls
The Quick-Override controls allow you to temporarily change the input or output language for a single dictation without modifying your saved settings.
Speech Input Override
Select a different input language for the next recording:
- Auto-Detect — The STT provider detects the language automatically
- Individual languages (see Appendix A)
Text Output Override
Select a different output language (equivalent to temporarily enabling translation):
- Default (same as input) — No translation
- All 20 translation languages
Reset to Settings
When an override is active, a Reset button (↩ icon) appears. Tap/click it to revert to your saved settings.
11. Key Pool
The Key Pool is where you manage your API keys. Talk to me uses a pool-based architecture — you can add multiple keys per category, and the app automatically rotates between them based on trust scores.
Categories
| Category | Purpose | Supported Providers |
|---|---|---|
| Speech-to-Text | Transcription | OpenAI Whisper, Deepgram Nova, ElevenLabs Scribe v2, Groq Whisper |
| AI-Polish / LLM | Grammar, translation | OpenAI, Groq, Anthropic, Google Gemini, xAI Grok |
| Text-to-Speech | Voice synthesis | ElevenLabs, Deepgram, OpenAI TTS |
Adding a Key
- Expand the Key Pool section.
- Click/tap + Add Key in the desired category.
- Select the Provider.
- Enter a Label (e.g., "My OpenAI Key").
- Enter your API Key.
- Click/tap Save Key.
Key Slot Features
Each key slot displays:
- Label and Provider
- Masked Key (last 4 characters visible)
- Trust Score — Color-coded (green/yellow/red)
- Statistics — Calls, successes, failures, rate limits
Actions per slot:
- Test — Verify the key works
- Pause / Activate — Temporarily disable or re-enable
- Remove — Permanently delete
Trust System
| Level | Score | Color | Behavior |
|---|---|---|---|
| Excellent | ≥80% | Green | Preferred |
| Good | ≥60% | Green | Normal |
| OK | ≥40% | Yellow | Fallback |
| Weak | ≥20% | Yellow | Rarely used |
| Critical | <20% | Red | Last resort |
Keys that hit rate limits are placed in automatic cooldown while other keys are used.
12. AI Voice Chat
Talk to me includes two independent AI Voice Chat engines, each with its own strengths. You can switch between them at any time from the AI Chat tab.
| Engine | Technology | Key Advantage |
|---|---|---|
| 12a. Deepgram Voice Agent | Deepgram Agent API (WebSocket) | 32+ presets, 6 LLM providers, 4 TTS providers, latency monitoring, managed & BYO modes |
| 12b. Gemini 3.1 Flash Live | Google Gemini Live API (WebSocket) | 30 expressive voices, persona presets, thinking depth control, native Google multimodal AI |
Full hands-free speaker mode (Android)
Both voice chat engines work completely hands-free through your phone speaker. Talk to me uses proprietary acoustic echo cancellation (AEC) via a native Android bridge to separate your voice from the AI's speaker output. Interrupt anytime — the AI stops immediately and continues from where you want. No headphones or extra equipment required. Desktop users with any standard setup work equally well.
12a. Deepgram Voice Agent
The Deepgram Voice Agent provides real-time, full-duplex AI voice conversations through a single WebSocket connection to the Deepgram Agent API. It orchestrates Speech-to-Text (STT), Large Language Models (LLMs), and Text-to-Speech (TTS) in one unified pipeline — you speak, the AI thinks, and responds with natural voice, all in real time.
Getting Started
- Switch to the AI Chat tab, then select the Deepgram sub-tab.
- Add a Deepgram API key in the Key Pool (scroll down to the “Deepgram Voice Agent” section).
- Choose a Configuration Preset or configure manually.
- Tap the green Start Conversation button.
Configuration Presets (32+ Options)
Talk to me ships with over 32 presets across six categories. Each preset pre-configures STT model, LLM provider/model, TTS provider/voice, and turn-detection parameters.
Top Tier — Best Quality
| Preset | LLM | TTS | STT |
|---|---|---|---|
| Gemini 3.0 Pro + Sonic-3 | Google Gemini 3.0 Pro | Cartesia Sonic-3 | Nova-3 |
| Claude 4.5 + Sonic-3 | Anthropic Claude Sonnet 4.5 | Cartesia Sonic-3 (Tessa) | Nova-3 |
| Claude 4.6 + Sonic-3 | Anthropic Claude Sonnet 4.6 | Cartesia Sonic-3 (Katie) | Nova-3 |
| GPT-5.4 + Sonic-3 | OpenAI GPT-5.4 | Cartesia Sonic-3 (Katie) | Nova-3 |
| GPT-5.4 + Kiefer | OpenAI GPT-5.4 | Cartesia Sonic-3 (Kiefer, Male) | Nova-3 |
Ultra-Fast — Lowest Latency (~1.1s)
| Preset | LLM | TTS | STT |
|---|---|---|---|
| GPT-4o Mini + Sonic-3 | OpenAI GPT-4o Mini | Cartesia Sonic-3 | Nova-3 |
| GPT-5.4 Nano + Sonic-3 | OpenAI GPT-5.4 Nano | Cartesia Sonic-3 | Nova-3 |
| Haiku 4.5 + Sonic-3 | Anthropic Claude Haiku 4.5 | Cartesia Sonic-3 | Nova-3 |
| Gemini 2.5 Flash + Sonic-3 | Google Gemini 2.5 Flash | Cartesia Sonic-3 | Nova-3 |
| Nemotron 49B + Sonic-3 | NVIDIA Nemotron Super 49B | Cartesia Sonic-3 | Nova-3 |
Flux — English Only, Ultra-Low Latency
Flux uses Deepgram's Flux STT model with eager end-of-turn detection for the absolute fastest response times. English only.
| Preset | LLM | TTS |
|---|---|---|
| Flux + GPT-4o Mini + Sonic-3 | OpenAI GPT-4o Mini | Cartesia Sonic-3 |
| Flux + GPT-5.4 Nano + Sonic-3 | OpenAI GPT-5.4 Nano | Cartesia Sonic-3 |
| Flux + GPT-5.4 + Sonic-3 | OpenAI GPT-5.4 | Cartesia Sonic-3 |
| Flux + Claude 4.6 + Sonic-3 | Anthropic Claude 4.6 | Cartesia Sonic-3 |
| Flux + Gemini Flash + Sonic-3 | Google Gemini 2.5 Flash | Cartesia Sonic-3 |
Balanced — Quality + Speed
| Preset | LLM | TTS |
|---|---|---|
| GPT-5 Mini + Sonic-3 | OpenAI GPT-5 Mini | Cartesia Sonic-3 |
| GPT-4.1 Mini + Sonic-3 | OpenAI GPT-4.1 Mini | Cartesia Sonic-3 |
| Haiku 4.5 + Tessa | Anthropic Haiku 4.5 | Cartesia Sonic-3 (Tessa) |
| Gemini 3.0 Flash + Sonic-3 | Google Gemini 3.0 Flash | Cartesia Sonic-3 |
Experimental — Deepgram Aura-2 TTS (Language-Specific)
| Preset | LLM | TTS Voice |
|---|---|---|
| GPT-5.4 + Julius (DE) | OpenAI GPT-5.4 | Aura-2 Julius (German, Male) |
| GPT-5.4 + Zeus (EN) | OpenAI GPT-5.4 | Aura-2 Zeus (English, Male) |
| Claude 4.6 + Thalia (EN) | Anthropic Claude 4.6 | Aura-2 Thalia (English, Female) |
| GPT-5.4 + Agathe (FR) | OpenAI GPT-5.4 | Aura-2 Agathe (French, Female) |
| GPT-5.4 + Celeste (ES) | OpenAI GPT-5.4 | Aura-2 Celeste (Spanish, Female) |
Full BYO — Bring Your Own LLM & TTS Keys
In Full BYO mode, Deepgram handles only STT (Nova-3). Your own API keys for LLM and TTS providers are used directly.
| Preset | LLM (BYO Key) | TTS (BYO Key) |
|---|---|---|
| GPT-5.4 + ElevenLabs | OpenAI GPT-5.4 | ElevenLabs Turbo v2.5 |
| GPT-5.4 + OpenAI TTS | OpenAI GPT-5.4 | OpenAI TTS-1 |
| GPT-5.4 Nano + ElevenLabs | OpenAI GPT-5.4 Nano | ElevenLabs Turbo v2.5 |
| Gemini 3 Pro + ElevenLabs | Google Gemini 3 Pro | ElevenLabs Turbo v2.5 |
| Gemini Flash + OpenAI TTS | Google Gemini 2.5 Flash | OpenAI TTS-1 |
| Claude 4.6 + ElevenLabs | Anthropic Claude 4.6 | ElevenLabs Turbo v2.5 |
| Claude 4.6 + OpenAI TTS | Anthropic Claude 4.6 | OpenAI TTS-1 |
| Grok 3 Mini + ElevenLabs | xAI Grok 3 Mini | ElevenLabs Turbo v2.5 |
Preset Lock & Unlock
When a preset is active, all configuration fields are locked to the preset values (indicated by a lock icon). This prevents accidental changes. To override individual settings, tap Unlock for manual editing. Changing any setting manually switches the preset to “Manual Configuration”.
Manual Configuration
Tap the gear icon next to the Start button to open the configuration panel. All fields below are available:
LLM Provider
| Provider | Key Models |
|---|---|
| OpenAI | GPT-4o Mini, GPT-4.1 Nano/Mini/Full, GPT-5 Nano/Mini/Full, GPT-5.1–5.4 (incl. Nano, Mini) |
| Anthropic | Claude Haiku 4.5, Sonnet 4, Sonnet 4.5, Sonnet 4.6 |
| Gemini 2.5 Flash/Flash Lite, Gemini 3.0 Flash/Pro, Gemini 3.1 Flash Lite | |
| NVIDIA | Llama Nemotron Super 49B, Nemotron 3 Nano 30B |
| xAI | Grok 3, Grok 3 Mini, Grok 3 Fast |
| Groq | GPT OSS 20B |
TTS Provider
| Provider | Voices | Languages | Key Required |
|---|---|---|---|
| Cartesia Sonic-3 | 9 voices (Katie, Kiefer, Tessa, Kyle, Leo, Jace, Gavin, Maya, Default) | 42 languages (multilingual auto-detect) | Deepgram key only (managed) |
| Deepgram Aura-2 | 35+ voices (EN, DE, FR, ES, IT, NL, JA) | Language-specific per voice | Deepgram key only (managed) |
| ElevenLabs | Your ElevenLabs voices (auto-loaded) | Multilingual | ElevenLabs API key (BYO) |
| OpenAI TTS | 10 voices (Alloy, Ash, Ballad, Coral, Echo, Fable, Nova, Onyx, Sage, Shimmer) | English | OpenAI API key (BYO) |
STT Model
| Model | Languages | Use Case |
|---|---|---|
| Nova-3 | Multilingual | Standard, best overall accuracy |
| Nova-3 General | Multilingual | General-purpose variant |
| Nova-3 Medical | Multilingual | Medical terminology optimized |
| Flux | English only | Ultra-low-latency turn detection |
Other Settings
- Language — Auto-Detect (Multilingual) or a specific language: English, German, French, Spanish, Italian, Dutch, Japanese, Portuguese, Hindi, Russian
- Greeting Message — Text the agent speaks when the conversation starts (optional)
- System Instruction — Define the AI’s personality and behavior. A base instruction is always included that prevents markdown formatting and follow-up questions in speech output.
Advanced Settings
Expand the Advanced section for fine-tuning:
- Temperature (0.00 – 2.00) — Controls response creativity. Default: 0.7. Lower = more focused, higher = more creative.
- STT Model — Switch between Nova-3 variants and Flux.
When Flux STT is selected, additional controls appear:
- Eager EOT Threshold (0.0 – 1.0) — How aggressively the system detects end-of-turn. Higher = faster response but may cut you off mid-sentence.
- EOT Timeout (0 – 5000ms) — Maximum silence before the agent responds.
For ElevenLabs BYO: A custom Voice ID field lets you enter any ElevenLabs voice ID directly.
For OpenAI TTS BYO: Select from 10 OpenAI voices (Alloy, Ash, Ballad, Coral, Echo, Fable, Nova, Onyx, Sage, Shimmer).
During a Conversation
- Status indicator — Shows Ready, Connecting, Live (with elapsed time), or Error
- Audio level meter — Displays microphone input with Listening/Silent state
- Thinking indicator — A green badge appears while the LLM processes your input
- Conversation transcript — Real-time display of all dialogue. Your messages appear on the right (green), the agent’s on the left (blue).
- Barge-in — Interrupt the AI at any time by speaking. The agent stops immediately and listens to you.
- Resize handle — Drag the handle below the transcript to resize the chat area (120px to 85% of screen)
- Dual Start/Stop buttons — One at the top, one sticky at the bottom for easy access while scrolling
Latency Monitoring
A compact latency bar appears after the first turn, showing three key metrics:
- LLM — Time from your speech to the first LLM token
- TTFB — Total Time to First Byte (end-to-end)
- TURN — Full turn duration including audio playback
Values are color-coded: green (< 2s), yellow (2–5s), red (> 5s).
Tap the latency bar to expand a detailed per-turn table with columns: #, Speech duration, LLM time, TTS time, TTFB, Audio length, Total. Average LLM and TTFB are displayed in the header.
Echo Cancellation (AEC)
Talk to me includes proprietary Acoustic Echo Cancellation via a native Android Kotlin bridge. The AI’s speaker output is captured and subtracted from your microphone input in real time, preventing self-triggering feedback loops. This allows full hands-free operation on speaker without headphones. Works on all managed presets and most BYO configurations.
Key Pool — Deepgram Voice Agent
The Deepgram Voice Agent Key Pool is a dedicated, collapsible section below the chat area. It manages:
- Deepgram API Keys (required) — for STT and managed LLM/TTS routing
- LLM Keys (optional, Full BYO only) — OpenAI, Anthropic, Gemini, xAI
- TTS Keys (optional, Full BYO only) — ElevenLabs, OpenAI TTS
Each key card shows a 4-row layout: label, provider badge + masked key, trust score with statistics, and Test/Pause action buttons. You can test individual keys or all keys at once.
Session Limits
Sessions are limited to 15 minutes maximum (API constraint). The elapsed time is shown in the Stop button. The session ends automatically when the limit is reached.
Tips
- Start with a managed preset (Top Tier or Ultra-Fast) — they require only a Deepgram key and offer the best experience.
- GPT-5.4 Nano + Cartesia Sonic-3 delivers ~1.1s response times — the fastest option.
- Flux presets are English-only but extremely fast due to eager end-of-turn detection.
- Full BYO presets use your own LLM/TTS keys for maximum control but may have reduced barge-in performance with some TTS providers.
- All settings take effect on the next session start, not during a live session.
12b. Gemini 3.1 Flash Live
Gemini 3.1 Flash Live provides real-time voice conversations powered by Google’s latest audio AI model. It delivers the speed and natural rhythm needed for voice-first interaction, with sub-second latency, 30 expressive voices, and native multimodal understanding.
Requirements
You need a Google Gemini API key (paid tier recommended) added to the LLM Key Pool in Settings. The key is automatically available for AI Voice Chat. The model used is gemini-3.1-flash-live-preview.
Starting a Conversation
Navigate to the AI Chat tab, then select the Gemini sub-tab. Tap Start Conversation. The app connects to Gemini via WebSocket, opens your microphone, and begins listening. Speak naturally — Gemini responds in real-time audio. Tap End to stop.
Voices (30 Options)
Choose from 30 natural AI voices, each with a distinct personality:
| Voice | Character | Best For |
|---|---|---|
| Sulafat | Warm | Storytelling, bedtime stories, calm conversations |
| Gacrux | Mature | Authoritative narration, mentoring, deep discussions |
| Algenib | Gravelly | Cinematic narration, dramatic reading, character voice |
| Kore | Firm | Professional briefings, news reading, factual Q&A |
| Puck | Upbeat | Energetic conversations, motivation, brainstorming |
| Zephyr | Bright | Optimistic chats, friendly assistance, greetings |
| Charon | Informative | Tutorials, documentary-style explanations |
| Fenrir | Excitable | Enthusiastic reactions, game commentary, hype |
| Leda | Youthful | Casual chat, Gen-Z conversations, trendy topics |
| Aoede | Breezy | Relaxed conversations, travel talk, lifestyle |
| Achernar | Soft | Meditation guidance, ASMR-style, gentle encouragement |
| Algieba | Smooth | Podcast hosting, audiobooks, long-form reading |
| Despina | Smooth | Elegant narration, luxury brand voice |
| Achird | Friendly | Customer support, everyday assistance, welcoming tone |
| Vindemiatrix | Gentle | Supportive conversations, therapy-like tone, empathy |
| Sadaltager | Knowledgeable | Technical explanations, expert Q&A, encyclopedic |
| Rasalgethi | Informative | Science documentaries, educational content |
| Schedar | Even | Balanced discussions, neutral reporting, debates |
| Alnilam | Firm | Commanding presence, leadership, formal settings |
| Pulcherrima | Forward | Assertive communication, pitches, presentations |
| Zubenelgenubi | Casual | Laid-back chat, friends catching up, humor |
| Sadachbia | Lively | Animated storytelling, children’s content, playful |
| Laomedeia | Upbeat | Morning shows, cheerful updates, positive vibes |
| Callirrhoe | Easy-going | Casual advice, lifestyle coaching, approachable |
| Autonoe | Bright | Creative sessions, idea generation, art discussions |
| Enceladus | Breathy | Intimate narration, poetry reading, atmospheric |
| Iapetus | Clear | Precise instructions, step-by-step guides, clarity |
| Erinome | Clear | Clean communication, corporate training, diction |
| Umbriel | Easy-going | Relaxed Q&A, weekend vibes, mellow conversations |
Tip: Preview all voices in the Google AI Studio Voice Library.
Language
Select from 24 supported languages or leave on Auto-detect. Gemini responds in the language you speak — or in the language you select. Supported: English, German, French, Spanish, Italian, Portuguese, Dutch, Polish, Romanian, Russian, Ukrainian, Turkish, Arabic, Hindi, Bengali, Tamil, Telugu, Marathi, Japanese, Korean, Thai, Vietnamese, Indonesian.
Persona Presets
Persona presets define how Gemini behaves — its personality, tone, and communication style. Choose from six presets or create your own:
| Preset | Behavior |
|---|---|
| Friendly Assistant | Warm, conversational, approachable — great for everyday use |
| Professional | Clear, concise, authoritative — for business and work |
| Enthusiastic | Energetic, positive, encouraging — for brainstorming and motivation |
| Calm & Soothing | Slow, gentle, patient — for relaxation and guided sessions |
| Teacher | Patient, step-by-step, uses analogies — for learning and explanations |
| Creative | Imaginative, expressive, vivid language — for storytelling and art |
| Custom | Write your own system instruction from scratch |
System Instruction
The System Instruction is a text briefing you give to Gemini before the conversation starts. Think of it as directing an actor: tell the AI who it is, how to behave, and what to focus on.
Examples:
- “You are a patient Italian language tutor. Speak slowly. Correct my grammar gently.”
- “You are a senior software architect. Answer concisely and technically.”
- “You are a creative storyteller. Speak with flair. Use vivid language.”
When using a Persona Preset, your custom text is appended to the preset instruction. In Custom mode, your text is the entire instruction. Write in English for best results. Settings are saved automatically.
Thinking Depth
Control how deeply Gemini reasons before responding:
| Level | Behavior |
|---|---|
| Minimal | Fastest responses, minimal internal reasoning (default) |
| Low | Brief consideration, good balance |
| Medium | Thoughtful responses, longer pause before answering |
| High | Deep reasoning, best for complex questions |
Temperature & Top-P
Temperature (0.0 – 2.0) controls how creative vs. predictable the AI responds:
| Range | Behavior | Best For |
|---|---|---|
| 0.0 – 0.5 | Focused, deterministic | Facts, technical answers, precise instructions |
| 0.7 – 1.0 | Balanced, natural (default: 1.0) | Most conversations, everyday use |
| 1.2 – 2.0 | Creative, surprising | Brainstorming, storytelling, creative writing |
Top-P (0.0 – 1.0) limits the pool of words the AI considers. At 0.95 (default), the model picks from the top 95% most likely words. Lower values make output more conservative.
Voice Activity Detection (VAD)
VAD settings control how Gemini detects when you start and stop speaking:
- Speech Start Sensitivity — How easily the system detects speech onset.
- Speech End Sensitivity — How quickly the system decides you’ve stopped talking.
- Silence Duration — How many milliseconds of silence before your turn is considered complete (100–2000ms).
Echo Cancellation (AEC)
Identical to the Deepgram Voice Agent, Gemini 3.1 Flash Live benefits from Talk to me’s proprietary acoustic echo cancellation via the native Android Kotlin bridge. Full hands-free speaker mode works without headphones.
Tips for Best Results
- Speak naturally — Gemini supports natural barge-in (interrupt anytime)
- On Android, the built-in AEC eliminates echo — no headphones needed
- Session length is limited to 15 minutes per connection (API limit)
- All settings take effect on the next session start (not during a live session)
- The audio level meter shows a colored gradient (green, yellow, orange, red) indicating your microphone input level
- Transcription of your speech and Gemini’s responses can be toggled on/off independently
13. Mini-Player Windows
The Mini-Player is a compact Always-on-Top window that provides essential dictation controls without occupying your full screen.
Entering Mini-Player Mode
Click the Collapse button (↗ icon) in the header. The app window shrinks to a compact overlay positioned at the bottom center of your screen.
Mini-Player Layout
The Mini-Player displays a 3×3 grid of essential controls:
- Row 1: Speech Input selector, Status/Start button, Text Output selector
- Row 2: Voice Translate toggle, Inline Pill (spectrum analyzer), Save Recordings
- Row 3: Pipeline timing TDFs, Result preview
DPI-Aware Sizing
The Mini-Player automatically adjusts its size based on your display's DPI scaling, ensuring consistent visual dimensions across monitors with different resolutions (100%, 125%, 150%).
Exiting Mini-Player Mode
Click the Expand button to return to the full-size window at its previous position and size.
14. Global Hotkeys Windows
Talk to me registers system-wide hotkeys so you can control dictation without switching to the app window.
Primary Hotkeys
| Hotkey | Action |
|---|---|
| Ctrl+Win | Start / Stop Recording (global, works from any app) |
| Ctrl+Win (while processing) | Cancel current pipeline |
TTS Hotkey
When text is selected in any application, the TTS hotkey reads it aloud using your configured TTS provider.
Low-Level Hook
The global hotkey uses a Windows low-level keyboard hook, which means it works even when the app is minimized or another application has focus. The hook operates in "zero-swallow mode" — it intercepts the key combination without blocking other keyboard input.
15. Auto-Read Windows
Auto-Read is a Windows-exclusive feature that extracts text from the currently focused application and reads it aloud via TTS.
How It Works
- Enable Auto-Read by clicking the Auto-Read button.
- Select text in any application (or use Ctrl+C to copy).
- Talk to me detects the clipboard content and automatically reads it aloud using your TTS configuration.
Use Cases
- Read emails, articles, or documents without staring at the screen.
- Review your own writing by hearing it spoken back.
- Accessibility support for vision-impaired users.
16. Notification Listener Windows
The Notification Listener captures Windows toast notifications and reads them aloud via TTS.
Requirements
- Windows Desktop version
- Notification access permission granted in Windows Settings
How It Works
- Enable Notification Listener by clicking the toggle.
- Grant notification access when prompted by Windows.
- When a Windows toast notification arrives (email, chat message, calendar reminder), Talk to me extracts the notification title and body, and reads it aloud using your TTS configuration.
Configuration
- Enable/disable in Settings → Hands-Free
- TTS voice and provider follow your global TTS settings
17. MP3 Recording & Save Windows
Record TTS Readings
When enabled, every TTS synthesis is automatically saved as an MP3 file with sequential numbering (e.g., recording_001.mp3, recording_002.mp3).
Save Recordings
Click Save Recordings to open the folder containing all recorded MP3 files. You can configure the recording directory in Settings.
A Note on Android Permissions Android
The Android version of Talk to me requires several system permissions (Microphone, Overlay, Accessibility Service, Notification Listener) — each with its own confirmation dialog. We understand that this can feel cumbersome.
We would have preferred a simpler setup experience. However, Google Play Store policies and Android security guidelines require that each sensitive permission is requested individually, with a clear disclosure explaining what the permission is used for and what it is not used for. These multi-step confirmation flows are not our design choice — they are mandated by platform compliance requirements.
Each permission is requested only when you actually need the feature, not all at once during installation. You can revoke any permission at any time through Android Settings. The app will continue to work — the corresponding feature will simply be disabled.
Here is a summary of all Android permissions and why they are needed:
| Permission | Feature | Required? |
|---|---|---|
| Microphone | Speech-to-Text dictation, AI Voice Chat | Yes — core feature |
| Draw over other apps | Floating Bubble (hands-free overlay) | Only if you use the overlay |
| Accessibility Service | Auto-Paste text into chat app input fields | Only if you use Auto-Paste |
| Notification Listener | Auto-Read incoming messages aloud | Only if you use Auto-Read |
| Internet | Communication with AI providers | Yes — required for all features |
Thank you for your understanding. We take your privacy seriously — none of these permissions are used to collect, store, or transmit personal data. See Privacy and Security for full details.
18. Floating Bubble (Overlay) Android
The Floating Bubble is a small circular icon that floats on top of all other apps, providing hands-free dictation access without switching apps.
Activating the Overlay
- Tap the Overlay button in the main app.
- If Android's "Draw over other apps" permission is not yet granted, you will be directed to enable it.
- A small Talk to me bubble appears on screen.
Using the Bubble
- Single Tap: Start or stop recording. Red pulsing border during recording, blue pulsing border during TTS readout.
- Triple Tap: Test readback — reads a predefined text to confirm TTS works.
- Long Press: Clears the unread message queue.
- Drag: Move the bubble anywhere on screen.
During Recording via Bubble
- Tap the bubble to start recording.
- After transcription, a "✓ Inserted!" toast confirms the text was pasted or placed in clipboard.
Bubble Translation and Auto-Insert
The Bubble uses the same translation logic as the main window: if your input and output languages differ, your dictation is automatically translated before being inserted. Voice Translate (text-to-speech readout) also works in the Bubble.
Using Android's Accessibility Service, the Bubble inserts the (possibly translated) text directly into the focused input field. In all mainstream apps we tested — including WhatsApp, Gmail, Discord, Microsoft Teams, Viber, Chrome, ChatGPT, Facebook, Instagram, Pinterest, and Skool — auto-insert works reliably.
If you use a very exotic app where auto-insert fails, the already translated text is guaranteed to be in the clipboard — a long press on the input field and "Paste" makes the text visible.
Stopping the Overlay
Tap the Overlay button again or tap Stop on the notification.
19. Auto-Paste Android
Auto-Paste uses Android's Accessibility Service to automatically insert dictated text into the currently focused text field.
Enabling Auto-Paste
- Tap the Auto-Paste button.
- A disclosure dialog explains what the Accessibility Service does and does not do. Tap Enable Auto-Paste.
- You are directed to Android's Accessibility Settings. Find Talk to me and enable it.
- The button now shows ✓ with a cyan border.
Accessibility Shortcut Button
When enabling the Accessibility Service, Android will ask you to choose an activation shortcut. This determines how you can quickly toggle the service on/off:
- Accessibility button (recommended): A small button appears in the navigation bar. Tap it to toggle the service.
- Volume Up + Volume Down (hold 3 seconds): Press and hold both volume keys simultaneously for 3 seconds to toggle.
We recommend the Accessibility button option for the easiest experience. This is a standard Android system feature — the choice does not affect how Auto-Paste works.
Important Notes
- Requires Android Accessibility permission (a sensitive permission).
- May need to be re-granted after app updates.
- Used exclusively for text insertion — no other accessibility data is accessed.
App Compatibility
Auto-Paste works reliably in most Android apps. The following apps were tested with v0.5.159:
| App | Auto-Paste | Translation |
|---|---|---|
| ✅ | ✅ | |
| Gmail (recipient + body) | ✅ | ✅ |
| Discord | ✅ | ✅ |
| Microsoft Teams | ✅ | ✅ |
| Viber | ✅ | ✅ |
| Chrome | ✅ | ✅ |
| ChatGPT | ✅ | ✅ |
| ✅ | ✅ | |
| ✅ | ✅ | |
| ✅ | ✅ | |
| Skool (WebView in Chrome) | ✅ | ✅ |
| Viber | ✅ | ✅ |
"App Access Denied" — Restricted Settings (Android 13+)
On some devices, when enabling Auto-Paste or Notification Access, you may see "App access denied" or "For your security, this setting is currently unavailable." This is not a bug — it is an Android 13+ security feature called Restricted Settings.
Affected manufacturers: Lenovo (ZUI), Samsung (One UI), Xiaomi/Redmi (MIUI/HyperOS), OPPO/Realme (ColorOS), Huawei/Honor (EMUI/HarmonyOS), OnePlus (OxygenOS), Stock Android/Pixel.
How to fix:
- Open Android Settings → Apps → See all apps → find Talk to me.
- Tap Talk to me to open the App Info page (not the Notifications sub-page).
- Tap the three-dot menu (⋮) in the top-right corner.
- Select Allow restricted settings.
- Confirm with your PIN/fingerprint.
- Go back to Settings → Accessibility and enable Talk to me.
Tip: If the three-dot menu is not visible, first try to enable the permission (triggering the error), then go to the App Info page — the menu should now appear.
Xiaomi/MIUI/HyperOS: Go to Settings → Apps → Manage apps → Talk to me and scroll to the bottom.
Lenovo (ZUI): When tapping Apps in Settings, you may land on the Notifications sub-page instead of App Info. Navigate back and look for the full App Info page with storage, permissions, and battery sections.
20. Auto-Read Messages Android
Auto-Read automatically reads incoming chat messages aloud using TTS — ideal for driving, cooking, or exercising.
How It Works
- Enable Auto-Read (Headphones icon).
- Ensure Notification Access is granted.
- The Overlay must be active.
- When a message arrives from an allowed app, Talk to me announces the sender and reads the message aloud.
Pre-Selected Chat Apps
WhatsApp, WhatsApp Business, Telegram, Signal, Discord, Slack, Microsoft Teams, Viber, Messenger (Meta), Instagram, Google Messages, Samsung Messages.
You can add or remove apps in Auto-Read Apps Configuration.
21. Notification Access Android
Notification Access allows Talk to me to read incoming notifications, required for Auto-Read Messages.
Granting Access
- Tap the Notif Access button.
- Go to Android's Notification Listener Settings.
- Find Talk to me and enable it.
- The button shows ✓ with a cyan border.
Important Notes
- System-level permission — processes only notifications from explicitly allowed apps.
- No notification data is stored, transmitted, or logged.
22. Auto-Read Apps Configuration Android
Control which apps are allowed to have their notifications read aloud.
Known Chat Apps
Pre-selected messaging apps with individual toggles (WhatsApp, Telegram, Signal, Discord, Slack, Teams, Viber, Messenger, Instagram, Google Messages, Samsung Messages).
Search and Add Custom Apps
- Tap the search field and type an app name.
- Matching installed apps appear, sorted by relevance.
- Check the box to add an app.
How Filtering Works
- Only notifications from allowed apps are read aloud.
- Changes take effect immediately — no restart required.
23. Settings
UI Language
English, Deutsch, Français, Español — independent of your system language.
Quality Preset
| Preset | STT Provider | LLM Provider | Model | Polish |
|---|---|---|---|---|
| Top Performer | Scribe v2 | OpenAI | GPT-5.4 | Strong |
| Standard | Scribe v2 | OpenAI | GPT-4.1 mini | Strong |
| Budget | Whisper | Groq | Default | Light |
| Free | Deepgram | Groq | Default | Off |
| Custom | Manual | Manual | Manual | Manual |
Speech-to-Text
- Provider: OpenAI Whisper, Deepgram Nova-2/3, ElevenLabs Scribe v2, Groq Whisper
- Custom Keyterms (Scribe only): Proper nouns, brands, technical terms
- Language: Auto-Detect or specific
Text-to-Speech
- Provider: ElevenLabs, OpenAI TTS, Deepgram Aura 2
- Model (ElevenLabs): Eleven v3, Multilingual v2, Flash v2.5, Turbo v2.5
LLM Provider (Polish)
- Provider: OpenAI, Groq, Anthropic, Google Gemini, xAI Grok
- Model: Provider default or specific
- Polish Strength: Light or Strong
Translation Provider
Separate provider for AI-Translation (can differ from Polish provider).
AI-Polish / AI-Translate
Toggle each independently. When AI-Translate is enabled:
- Translate To: 20 target languages
- Voice Translate: Auto-read translations via TTS
Android Hands-Free
Quick toggles for Overlay, Auto-Read Messages, Auto-Paste, Notification Access.
Save and Test
- Save all current settings — Persists changes to device storage
- Test current configuration — Tests all configured providers with response times
24. Word Corrections
Word Corrections teach Talk to me the correct spelling of names, brands, and terms that speech recognition gets wrong.
Adding Corrections
Single Add
Enter Wrong spelling and Correct spelling, then tap/click Add.
Bulk Import
Enter the correct spelling, then list wrong variants (one per line). Use Generate with AI to auto-create likely misspellings.
Multi-Import
Enter pairs as wrong;correct (one per line). Supports ;, ->, comma, or tab separators.
How Corrections Work
During post-processing (Pipeline stage 3), wrong spellings are automatically replaced before AI-Polish runs.
25. Backup and Restore
Export Settings
- Open Backup & Restore in Settings.
- Tap/click Export Settings.
- Enter and confirm an Encryption Password (min. 6 characters).
- Windows: The save dialog suggests
talktome-settings.ttm— you choose the folder. - Android: The backup is written to your Downloads area as
TalkToMe-backup.ttm. If that name already exists, the system may add(1),(2), etc. — all are valid encrypted backups.
Import Settings
- Tap/click Import Settings.
- Automatic (Android): The app looks for the newest matching file named
TalkToMe-backupwith a.ttmextension (includingTalkToMe-backup (1).ttm, etc.) in app storage and in Downloads. - If the system file picker opens: On many phones (e.g. Samsung), the first screen is Recently used and may default to Images — your
.ttmfiles are hidden until you switch the top filter to Documents or This week, or open the Download folder directly. - New device: Copy the
.ttmfrom your old device (USB, cloud, email), then use Import and pick that file. - Enter the encryption password.
- All settings are restored and the app restarts.
Technical Details
- Encryption: AES-256-GCM with PBKDF2-HMAC-SHA256 (100,000 iterations)
- Included: All settings, API keys, word corrections, auto-read apps, quality preset, UI language
- NOT included: License activation (tied to Machine ID)
26. Usage Dashboard
| Metric | Description |
|---|---|
| STT Calls | Speech-to-text transcriptions performed |
| LLM Polish | AI-Polish or AI-Translate operations |
| TTS Synth | Text-to-speech synthesis operations |
Counters are cumulative since the last settings reset.
27. Troubleshooting
General
| Problem | Solution |
|---|---|
| "No API key configured" | Add a key in Key Pool for the feature you need |
| Recording doesn't start | Check microphone permission in system settings |
| Voice Translate produces no audio | Ensure a TTS API key is configured and working |
| Export fails | Check write access to Downloads folder |
| Can't see backup in Import file picker | Switch from Images to Documents / This week, or open the Download folder — see §25 Import |
Windows Windows-Specific
| Problem | Solution |
|---|---|
| Ctrl+Win hotkey doesn't work | Ensure the app is running (check system tray) |
| Text not pasted after dictation | Ensure the target window supports Ctrl+V |
| Notification Listener unavailable | Available on Windows Desktop — ensure notification access is granted in Windows Settings |
| Mini-Player looks too large/small | DPI-aware sizing adjusts automatically; restart the app if display settings changed |
Android Android-Specific
| Problem | Solution |
|---|---|
| Auto-Read doesn't work | Ensure Overlay is active, Auto-Read enabled, and Notification Access granted |
| Auto-Paste doesn't work | Re-enable Accessibility Service in Android Settings |
| Bubble doesn't appear | Grant "Draw over other apps" permission |
| "App access denied" when granting permissions | Restricted Settings (Android 13+) — see §19 "Restricted Settings" for the step-by-step solution |
| Screen doesn't rotate (Tablet) | Check if PC Mode is active (pull down Quick Settings). Auto-Rotate is ignored in PC Mode — switch back to Android Mode. Primarily affects Lenovo tablets (ZUI). |
28. Privacy and Security
Data Handling
- No data collection: Talk to me does not collect, store, or transmit any user data to mrocon GmbH servers.
- Direct API communication: Audio and text go directly from your device to your chosen AI provider.
- Local storage only: All settings and API keys are stored exclusively on your device.
- No analytics: No tracking, analytics, or telemetry of any kind.
Permissions
Windows
| Permission | Purpose |
|---|---|
| Microphone | Record audio for dictation |
| Notification Access | Read notifications |
| Internet | Communicate with AI providers |
Android
| Permission | Purpose |
|---|---|
| Microphone | Record audio for dictation |
| Overlay (Draw over apps) | Display the floating bubble |
| Notification Listener | Read notifications for Auto-Read |
| Accessibility Service | Auto-Paste text into fields |
| Internet | Communicate with AI providers |
| Query Installed Packages | Show app names in Auto-Read settings |
Encryption
- Windows: API keys encrypted with DPAPI (Windows Data Protection API)
- Android: API keys in app-private internal storage
- Backup files: AES-256-GCM encryption
Appendix A — Supported Languages
Speech Input Languages
Auto-Detect, German, English, French, Spanish, Italian, Portuguese, Dutch, Japanese, Korean, Chinese, Russian, Arabic, Hindi, Polish, Turkish, Swedish, Ukrainian
Translation Target Languages
German, English, French, Spanish, Italian, Portuguese, Dutch, Japanese, Chinese, Korean, Russian, Arabic, Hindi, Polish, Turkish, Swedish, Ukrainian, Danish, Finnish, Norwegian
TTS Languages
Auto, German, English, French, Italian, Spanish, Portuguese, Dutch, Polish, Swedish, Danish, Finnish, Norwegian, Turkish, Japanese, Korean, Chinese
UI Languages
English, Deutsch, Français, Español
Appendix B — Supported Providers
Speech-to-Text
| Provider | Notes |
|---|---|
| OpenAI Whisper | Most widely used, reliable |
| Deepgram Nova-2 / Nova-3 | Fast, good accuracy |
| ElevenLabs Scribe v2 | Supports custom keyterms |
| Groq Whisper | Free tier available, fast |
LLM (Polish / Translation)
| Provider | Notes |
|---|---|
| OpenAI | GPT-4o-mini, GPT-5.4, etc. |
| Groq | Free tier, Llama models |
| Anthropic | Claude models |
| Google Gemini | Gemini models |
| xAI Grok | Free tier available |
Text-to-Speech
| Provider | Notes |
|---|---|
| ElevenLabs | Best quality, voice cloning, 4 models |
| OpenAI TTS | 6 built-in voices, simple |
| Deepgram Aura 2 | Fast synthesis |
Appendix C — Quality Presets
| Preset | STT | LLM | Model | Polish | Cost |
|---|---|---|---|---|---|
| Top Performer | Scribe v2 | OpenAI | GPT-5.4 | Strong | $$$ |
| Standard | Scribe v2 | OpenAI | GPT-4.1 mini | Strong | $$ |
| Budget | Whisper | Groq | Default | Light | $ |
| Free | Deepgram | Groq | Default | Off | Free |
| Custom | Manual | Manual | Manual | Manual | Varies |
Appendix D — Keyboard Shortcuts Windows
| Shortcut | Action |
|---|---|
| Ctrl+Win | Start / Stop Recording |
| Ctrl+Win (during processing) | Cancel Pipeline |
| TTS Hotkey | Read selected text aloud |
Talk to me is a product of mrocon GmbH. All rights reserved.
For support, contact team@talktome.studio or visit talktome.studio.
↑ Back to top