User Manual — Talk to me

Talk to me — User Manual

Version: 0.5.149 (Windows Desktop) / 0.5.157 (Android Hands-Free) Last Updated: 2026-04-20

This manual covers both the Windows Desktop and Android Hands-Free editions of Talk to me. Sections marked with Windows or Android apply only to that platform. All other sections apply to both.

1. Introduction

Talk to me is a professional dictation, translation, and voice interaction studio available for Windows Desktop and Android. It converts your speech into text, polishes it with AI, translates it into 20+ languages, and reads it back to you — all in real time.

The app follows a strict BYOK (Bring Your Own Key) and Zero-Knowledge / Zero-Trust architecture: your API keys and data never leave your device.

Key Features

Real-time Dictation: Record your voice and get polished text in seconds.
AI-Polish: Automatic grammar correction and filler word removal powered by your choice of AI provider.
Live Translation: Translate dictated text into 20+ languages on the fly.
Voice Translate (Speech-to-Speech): Your translated text is automatically read aloud in the target language.
Text-to-Speech: Convert any text into natural-sounding speech with ElevenLabs, OpenAI TTS, or Deepgram.
Live Language Immersion: Speak in your native language, instantly see and hear it in the language you want to master.
Word Corrections: Teach the app your names, brands, and terms that speech recognition gets wrong.
Encrypted Backup: Export all settings and API keys as a password-protected encrypted file.
Multi-Provider Support: Choose from OpenAI, Groq, Anthropic, Google Gemini, xAI Grok, ElevenLabs, Deepgram, and more.

Platform Highlights

Feature	Windows Desktop	Android Hands-Free
Mini-Player (compact mode)	✓	—
Global Hotkeys (Ctrl+Win)	✓	—
Auto-Read (Ctrl+C text extraction)	✓	—
Notification Listener	✓	—
MP3 Recording & Save	✓	—
Floating Pill (Spectrum Analyzer)	✓	—
Floating Bubble (Overlay)	—	✓
Auto-Paste (Accessibility)	—	✓
Auto-Read Messages (from chat apps)	—	✓
App-level Notification Access	—	✓

Security Principles

Zero-Knowledge: Talk to me never stores, transmits, or has access to your API keys on any server. All keys are stored locally on your device.
Zero-Trust: The app never phones home. No analytics, no tracking, no telemetry. Your dictation data flows directly from your device to your chosen AI provider and nowhere else.
BYOK: You bring your own API keys from the providers you trust. Talk to me does not resell API access.

2. Getting Started

Windows Installation — Windows Desktop

Talk to me for Windows is available as an EV-signed installer from talktome.studio or via the Microsoft Store.

System Requirements:

Windows 10 or later (64-bit)
An active internet connection
At least one API key from a supported provider

The installer is digitally signed with an Extended Validation (EV) certificate from Certum (mrocon GmbH). Windows SmartScreen will not show any warnings.

Android Installation — Android

Talk to me for Android is available as an APK from talktome.studio or via the Google Play Store.

System Requirements:

Android 8.0 or later
An active internet connection
At least one API key from a supported provider

First Launch

When you open Talk to me for the first time, you will see the License Gate. You have two options:

Enter a License Key to unlock the full app immediately.
Start a 7-Day Free Trial to explore all features without a license key.

After activation or trial start, the app loads and you can begin using it right away — provided you have at least one API key configured (see Key Pool).

Android Quick Start — Your First 5 Minutes

After activating your license (or starting the free trial), the app opens and you will see the main screen — the Cockpit. Don't worry if most buttons appear orange or inactive. That's completely normal! Here is what to do, step by step:

Step 1 — Enable Microphone Access

The large button in the center of the screen reads "Enable Microphone Access". This is the first and most important step.

Tap the Enable Microphone Access button.
A dialog from Talk to me explains why the microphone is needed. Tap OK.
Android then asks: "Allow Talk to me to record audio?" — tap While using the app (or Allow).
Done! The button changes to "Ready — Start Dictation" in green. You can now record your first dictation.

Step 2 — Add Your API Keys

At the bottom of the screen you will see the Key Pool bar — probably showing red labels like STT 0/5, LLM 0/5, TTS 0/5. This means no API keys are configured yet. Without keys, the app cannot connect to AI services.

Tap any of the Key Pool labels (e.g. STT) to open the Key Pool section.
Tap Add Key and paste an API key from your provider (e.g. OpenAI, Deepgram, ElevenLabs).
Tap Save. The label turns green when a valid key is stored.
Repeat for each category you want to use. At minimum, you need an STT key (for dictation). For AI polish, add an LLM key. For text-to-speech, add a TTS key.

See §11 Key Pool for a detailed guide on supported providers and how to obtain API keys.

Step 3 — Optional Features (Cockpit Buttons)

The buttons in the center of the Cockpit control optional features. Each one requires a system permission the first time you enable it. You will see a short explanation dialog from Talk to me, followed by the Android system dialog. Both are normal and safe to confirm.

Button	What it does	Details
Auto-Paste	Automatically pastes your dictated text into whichever app you were using (e.g. WhatsApp, email). No manual copy-paste needed.	§19
Notif Access	Lets the app read incoming notifications so it can auto-read messages to you.	§21
Auto-Read	Reads incoming messages aloud using text-to-speech — great for hands-free use while driving or cooking.	§20
Overlay	Shows a small floating bubble on your screen. Tap it to start/stop dictation from any app — without switching back to Talk to me.	§18

You don't need all of these right away. Start with dictation (Step 1 + 2), and enable the extras whenever you're ready. Each feature can be turned on or off at any time.

Free & Paid Tier Overview

Talk to me is a BYOK app (Bring Your Own Key). You use your own API keys from AI providers. Many providers offer generous free tiers — from $200 Deepgram credit to unlimited Gemini usage to free Grok and Groq keys. This means you can use Talk to me for months before any API costs arise.

Tier 1 — Completely Free (no money, no credit card)

What you need	What you get	How to get it
1× Deepgram account (free)	Speech-to-Text dictation (STT)	deepgram.com → Sign up → $200 starter credit
1× Gemini API key (free)	AI Voice Chat (Gemini Live)	aistudio.google.com → Create API Key

What you can do:

Dictate with Deepgram Nova-3 (preset “Free”) — no LLM polish, but solid transcription
AI Voice Chat via the Gemini Live tab — real-time voice conversation with sub-second latency, 30 voices, 24 languages

How long does it last?

Feature	Credit / Limit	Lasts for
Deepgram STT	$200 starter credit (never expires)	~43,000 min (~716 hours) transcription
Gemini Live Voice Chat	Free API key (no credit limit)	Unlimited (rate limit: ~10 sessions/min)
Gemini LLM (for Polish)	Free API key	250 requests/day (Flash model)

Reality: With these two free accounts you can use Talk to me productively for months. During intensive daily testing, only $19 of $200 Deepgram credit was used after weeks.

Tier 2 — Free with More Power (additional free keys)

What you need	What it adds	Cost
+ 1× xAI account	Grok-3-Mini as LLM for Polish + Translation	Free ($25 starter credit + up to $150/month with data sharing)
+ 1× Groq account	Ultra-fast LLM for Polish (Llama models)	Free (1,000 requests/day, no credit card)

Unlocked presets:

Preset	STT	LLM / Polish	All keys free?
Free	Deepgram Nova-3	—	Yes (1 key)
Free xAI	Deepgram Nova-3	xAI Grok	Yes (2 keys)
Free Gemini	Deepgram Nova-3	Google Gemini	Yes (2 keys)
Fast Free	OpenAI Whisper	Groq Llama	Yes (2 keys)
Economy	Deepgram Nova-3	Groq Llama	Yes (2 keys)
Economy Plus	Deepgram Nova-3	Groq Llama (Strong Polish)	Yes (2 keys)

Also unlocked:

Deepgram Voice Agent with 20+ managed presets (uses your $200 credit, $0.05–0.16/min)
Full BYO Voice Agent Presets (e.g. GPT-5.4 + ElevenLabs, if you have the keys)

Tier 3 — Premium Quality (paid keys)

For the absolute best quality, you need paid API keys:

Provider	Used for	Cost	What you get
OpenAI	GPT-5.4 (best LLM for Polish)	Pay-per-use (~$5–15/month)	Perfect grammar, style, translation
ElevenLabs	Scribe v2 (best STT) + TTS	From $5/month (Starter)	Best transcription, premium voices
Anthropic	Claude 4.6 Sonnet (top LLM)	Pay-per-use	Excellent text quality for longer texts

API Key Cost Overview

Provider	Sign up	Starter credit	Ongoing cost	Credit card?
Deepgram	Free	$200 (never expires!)	From $0.0043/min STT	No
Google Gemini	Free	Unlimited (rate-limited)	$0.005–0.018/min (Live Audio)	No
xAI (Grok)	Free	$25 + up to $150/month	From $0.10/1M tokens	No
Groq	Free	Unlimited (rate-limited)	1,000 requests/day free	No
OpenAI	Free	$5 (expires after 3 months)	From $0.15/1M tokens	Yes (for GPT-5+)
Anthropic	Free	$5 (expires after 30 days)	From $1.00/1M tokens	Yes
ElevenLabs	Free	10,000 chars/month	From $5/month (Starter)	Yes

Recommended Start (3 minutes, $0 cost)

Create Deepgram account → deepgram.com → Sign up → Copy API Key
Create Gemini API key → aistudio.google.com → “Create API Key” → Copy key
Enter keys in Talk to me → Settings → LLM Key Pool
Go: Dictation tab → preset “Free Gemini” → Dictate with STT + AI Polish. Gemini Live tab → “Start Conversation” → Real-time voice chat with AI.

Optional for even more:

xAI account → x.ai/api → Sign up → API Key → Enter in Key Pool → preset “Free xAI”
Groq account → console.groq.com → Sign up → API Key → presets “Economy” / “Economy Plus” / “Fast Free”

Feature Availability by Tier

Feature	Tier 1 (free)	Tier 2 (free+)	Tier 3 (premium)
Speech dictation (STT)	✓ Deepgram	✓ Deepgram + Whisper	✓ + ElevenLabs Scribe v2
AI Polish (grammar)	—	✓ Grok/Gemini/Groq	✓ + GPT-5.4 / Claude 4.6
Real-time translation	—	✓ (all LLM providers)	✓ (best quality)
Gemini Live Voice Chat	✓ (unlimited)	✓ (unlimited)	✓ (unlimited)
Deepgram Voice Agent	—	✓ (from $200 credit)	✓ (all presets)
BYO Voice Agent Presets	—	✓ (with xAI/Groq keys)	✓ (+ ElevenLabs/OpenAI TTS)
Available presets	2	6+ dictation + 20+ Voice Agent	All (30+)

All prices and free tier conditions are set by the respective providers and may change. Last updated: April 2026.

3. License Activation

The License Gate

On first launch (or after trial expiration), the License Gate is displayed. It shows:

The Talk to me wordmark
A text field for your license key (format: TTM-XXXX-XXXX-XXXX-XXXX)
Your Machine ID (a unique device identifier, needed for activation)
An Activate button
A Start 7-Day Free Trial button (if no trial has been used)
Links to Buy a License and the Customer Portal

Activating a License

Enter your license key in the text field.
Tap/click Activate.
The app verifies your key online and activates it for this device.
Once activated, you will not see the License Gate again unless you deactivate or your license expires.

The Free Trial

Tap/click Start 7-Day Free Trial to unlock all features for 7 days.
A banner at the top of the app shows how many trial days remain.
After 7 days, the trial expires and the License Gate reappears.

License Modal

Once inside the app, you can view your license status by clicking the License button (shield icon). The License Modal shows:

Status: Active, Trial, Grace Period, or Expired
Product: Your license product name
Plan: Yearly or Lifetime
Expires: Expiration date (or "Lifetime")
Devices: Number of active devices / maximum allowed
Key: Your license key (partially masked)
Machine ID: Your device's unique identifier

From this modal you can:

Deactivate Device — releases the license from this device so you can use it on another
Close — return to the app

4. App Overview

The app is organized into three main tabs and several supporting sections:

Navigation

At the top of the screen, three tabs let you switch between the app's primary modes:

Speech-to-Text — Record your voice and get polished, translated text
Text-to-Speech — Convert written text into spoken audio
AI Voice Chat — Have real-time voice conversations with AI (see §12)

Interface Layout

Below the tabs, the main interface is arranged vertically:

Quick-Override Controls — Language selectors for input and output
Action Buttons — Quick access to platform features
Status Indicator — Shows the current state (Ready, Recording, Transcribing, etc.)
Pipeline Display — Visual progress of your dictation through the processing stages
Result Area — Your transcribed/translated text
TTS Panel (Text-to-Speech tab only) — Text input and playback controls
AI Voice Chat Panel (AI Voice Chat tab only) — Voice/persona selection, conversation controls, live transcript (see §12)
Key Pool — Manage your API keys
Settings — All configuration options

Action Buttons

Windows Desktop action buttons:

Voice Translate — Toggle speech-to-speech translation
Notification Listener — Toggle notification readout
Auto-Read — Toggle Ctrl+C text-to-speech
Record TTS Readings — Toggle MP3 recording of TTS output
Save Recordings — Open recordings folder

Android action buttons:

License — Open license modal
Voice Translate — Toggle speech-to-speech translation
Overlay — Start/stop the Floating Bubble
Auto-Paste — Open Accessibility settings
Auto-Read — Toggle auto-read messages
Notif Access — Open notification listener settings

The Info Button

In the header, the Info button opens the App Info modal, which displays:

A link to talktome.studio
The support email (tap/click to copy)
The current app version
Number of detected microphones

5. Speech-to-Text

The Speech-to-Text tab is the primary mode of Talk to me. Here, you record your voice and receive polished, optionally translated text.

Recording a Dictation

Ensure the status shows Ready — Start Dictation (green).
Click/tap the large Start Dictation button.
The button turns red and shows Stop Recording. Speak clearly.
While recording, you can see: Recording duration in seconds, Audio level meter showing input volume, the currently active STT provider and language.
Click/tap the button again to Stop Recording.

Windows You can also start/stop recording using the global hotkey Ctrl+Win (no need to focus the app window).

What Happens After Recording

After you stop recording, the app processes your audio through the Pipeline (see The Pipeline):

Capture — Audio recording is finalized
STT — Your audio is transcribed by the selected provider
Post-Processing — The raw text is cleaned up (word corrections applied)
Polish / Translation — If enabled, AI corrects grammar or translates the text
Inject — The final text is placed in your clipboard

Windows The text is automatically pasted into the previously focused window via simulated Ctrl+V (Smart Clipboard Injection).

Android If Auto-Paste is enabled, the text is automatically inserted into the active text field via the Accessibility Service.

The Result Area

After processing, your text appears in the result area. A hint confirms the text has been copied to your clipboard and is ready to paste.

Recording Signals (Audio Cues)

Talk to me signals you acoustically and visually when the microphone is actually recording — so no words are lost.

Acoustic Signals

Start beep (short high blip): "Microphone is live, you can speak now."
Stop beep (short low blip): "Recording ended."

Both beeps can be toggled on/off in the settings and their volume can be adjusted (default: 100%).

Visual Signals

Idle/Standby: Microphone icon is orange — recording inactive.
Recording active: Microphone icon is green — every spoken word is being captured.

Note: Start Beep on Speakerphones

Some audio devices suppress the start beep. This is not a bug but a hardware characteristic:

Device Type	Beep Audible?	Recommendation
Speakers + separate microphone	✅ Yes	—
Headset with separate mic + speaker	✅ Yes	—
USB speakerphone (Jabra Speak2, Logitech P710e etc.)	⚠️ Possibly not	Use headset or external speakers
Bluetooth headset in Hands-Free profile	⚠️ Possibly not	Wired headset as alternative

Important: If you change the default audio device, restart Talk to me so the beep plays on the new device.

6. Text-to-Speech

The Text-to-Speech tab lets you convert any written text into natural-sounding speech.

Basic Usage

Switch to the Text-to-Speech tab.
Type or paste text into the text area.
Click/tap Read Aloud to start playback.

Playback Controls

Pause — Temporarily stops playback
Resume — Continues from where you paused
Stop — Ends playback entirely
Replay — Plays the same audio again without re-synthesizing

Provider and Voice Selection

ElevenLabs: Choose from your available voices or use "Default (Brian v3)". Custom Voice-IDs supported.
OpenAI TTS: Nova, Alloy, Echo, Fable, Onyx, Shimmer
Deepgram Aura 2: Fast synthesis

Model Selection (ElevenLabs)

Model	Character Limit	Best For
Eleven v3	5,000	Highest quality, short content
Multilingual v2	10,000	Multi-language support
Flash v2.5	40,000	Fast synthesis, long texts
Turbo v2.5	40,000	Speed and quality balance

Audio Quality

Quality	Description
MP3 192 kbps	Creator quality — highest fidelity
MP3 128 kbps	Standard — good balance
MP3 64 kbps	Compact — smaller file size
MP3 32 kbps	Minimal — lowest quality

Text Normalization

Setting	Description
Auto	The model decides how to handle numbers
Always On	Numbers converted to words (e.g., "42" → "forty-two")
Off	No normalization applied

Voice Fine-Tuning (ElevenLabs)

Slider	Range	Description
Stability	Variable ↔ Stable	Lower = more expressive; Higher = more consistent
Similarity	Creative ↔ Original	How closely the output matches the original voice
Style	Neutral ↔ Expressive	Amount of emotional expression
Speed	Slow (0.7×) ↔ Fast (1.2×)	Playback speed

Additional Options

Code-Filter: Strips code blocks and technical syntax before synthesis.
Auto-Record: Automatically saves synthesized audio. Tap the folder icon to choose the directory.
Speaker Boost: Enhances voice clarity (ElevenLabs only).

7. The Pipeline

The Pipeline is Talk to me's core processing engine. It visualizes the stages your audio passes through from recording to final output.

Pipeline Stages

Stage	Label	Description
1	Capture	Audio recording and finalization
2	STT	Speech-to-Text transcription
3	Post	Post-processing (cleanup, word corrections)
4	Polish or Trans	AI-Polish or AI-Translate
5	Inject	Text copied to clipboard / auto-pasted

TDF (Text Display Field) Indicators

Each pipeline stage shows the active provider (e.g., "Scribe v2", "GPT-5.4") and timing information after completion.

Timing Display

After processing, a timing line shows:

STT 1.2s → LLM 0.8s → Inject 0.1s → Total 2.1s

If Voice Translate is active, an additional S2S (Speech-to-Speech) timing is shown.

8. Voice Translate

Voice Translate combines AI-Translation with Text-to-Speech to create a real-time speech-to-speech translation experience.

New since v0.5.150: Text translation is now automatically active whenever your input language (Speech Input) and output language (Text Output) differ. You no longer need a separate switch for text translation. The Voice Translate button now only controls whether the final text is read aloud (text-to-speech output).

How It Works

Enable Voice Translate (purple when active).
Record a dictation in your source language.
The app transcribes → translates → reads the translation aloud.

Examples

DE → EN without Voice Translate: You speak German, receive English text — no audio output.
DE → EN with Voice Translate: You speak German, receive English text — and it is read aloud.
DE → DE with Voice Translate: Same language, no translation — but the text is read aloud.

Configuration

Target Language: Set in Settings → AI-Translate → Translate To
TTS Voice: Uses your configured TTS provider and voice

Use Cases

Travel: Speak in your language, have the translation read aloud.
Language Learning: Hear how your text sounds in another language.
Live Language Immersion: Turn your own thoughts into live fluency — speak in your native language and absorb the output in the language you want to master.

9. AI Polish & Translation

AI-Polish

When enabled, AI-Polish corrects grammar, punctuation, and (with "Strong" setting) removes filler words like "um", "uh", "you know", "basically".

Polish Strength:

Light — Grammar and punctuation correction only
Strong — Also removes filler words

Status indicators:

POLISH (cyan) — Active
OFF — Disabled
KEY MISSING (yellow) — No LLM key configured

AI-Translate

When enabled, your dictated text is translated into the target language.

Status indicators:

TRANSLATE (cyan) — Active, showing target language
VOICE OUTPUT (purple) — Voice Translate also active
TEXT ONLY — Translation without voice output
OFF — Disabled

Note: Since v0.5.150, Talk to me automatically detects when input and output languages differ and activates translation — without an explicit toggle. AI Polish remains independently available and is no longer automatically disabled.

10. Quick-Override Controls

The Quick-Override controls allow you to temporarily change the input or output language for a single dictation without modifying your saved settings.

Speech Input Override

Select a different input language for the next recording:

Auto-Detect — The STT provider detects the language automatically
Individual languages (see Appendix A)

Text Output Override

Select a different output language (equivalent to temporarily enabling translation):

Default (same as input) — No translation
All 20 translation languages

Reset to Settings

When an override is active, a Reset button (↩ icon) appears. Tap/click it to revert to your saved settings.

11. Key Pool

The Key Pool is where you manage your API keys. Talk to me uses a pool-based architecture — you can add multiple keys per category, and the app automatically rotates between them based on trust scores.

Category	Purpose	Supported Providers
Speech-to-Text	Transcription	OpenAI Whisper, Deepgram Nova, ElevenLabs Scribe v2, Groq Whisper
AI-Polish / LLM	Grammar, translation	OpenAI, Groq, Anthropic, Google Gemini, xAI Grok
Text-to-Speech	Voice synthesis	ElevenLabs, Deepgram, OpenAI TTS

Adding a Key

Expand the Key Pool section.
Click/tap + Add Key in the desired category.
Select the Provider.
Enter a Label (e.g., "My OpenAI Key").
Enter your API Key.
Click/tap Save Key.

Key Slot Features

Each key slot displays:

Label and Provider
Masked Key (last 4 characters visible)
Trust Score — Color-coded (green/yellow/red)
Statistics — Calls, successes, failures, rate limits

Actions per slot:

Test — Verify the key works
Pause / Activate — Temporarily disable or re-enable
Remove — Permanently delete

Trust System

Level	Score	Color	Behavior
Excellent	≥80%	Green	Preferred
Good	≥60%	Green	Normal
OK	≥40%	Yellow	Fallback
Weak	≥20%	Yellow	Rarely used
Critical	<20%	Red	Last resort

Keys that hit rate limits are placed in automatic cooldown while other keys are used.

12. AI Voice Chat

Talk to me includes two independent AI Voice Chat engines, each with its own strengths. You can switch between them at any time from the AI Chat tab.

Engine	Technology	Key Advantage
12a. Deepgram Voice Agent	Deepgram Agent API (WebSocket)	32+ presets, 6 LLM providers, 4 TTS providers, latency monitoring, managed & BYO modes
12b. Gemini 3.1 Flash Live	Google Gemini Live API (WebSocket)	30 expressive voices, persona presets, thinking depth control, native Google multimodal AI

Full hands-free speaker mode (Android)

Both voice chat engines work completely hands-free through your phone speaker. Talk to me uses proprietary acoustic echo cancellation (AEC) via a native Android bridge to separate your voice from the AI's speaker output. Interrupt anytime — the AI stops immediately and continues from where you want. No headphones or extra equipment required. Desktop users with any standard setup work equally well.

12a. Deepgram Voice Agent

The Deepgram Voice Agent provides real-time, full-duplex AI voice conversations through a single WebSocket connection to the Deepgram Agent API. It orchestrates Speech-to-Text (STT), Large Language Models (LLMs), and Text-to-Speech (TTS) in one unified pipeline — you speak, the AI thinks, and responds with natural voice, all in real time.

Getting Started

Switch to the AI Chat tab, then select the Deepgram sub-tab.
Add a Deepgram API key in the Key Pool (scroll down to the “Deepgram Voice Agent” section).
Choose a Configuration Preset or configure manually.
Tap the green Start Conversation button.

Configuration Presets (32+ Options)

Talk to me ships with over 32 presets across six categories. Each preset pre-configures STT model, LLM provider/model, TTS provider/voice, and turn-detection parameters.

Top Tier — Best Quality

Preset	LLM	TTS	STT
Gemini 3.0 Pro + Sonic-3	Google Gemini 3.0 Pro	Cartesia Sonic-3	Nova-3
Claude 4.5 + Sonic-3	Anthropic Claude Sonnet 4.5	Cartesia Sonic-3 (Tessa)	Nova-3
Claude 4.6 + Sonic-3	Anthropic Claude Sonnet 4.6	Cartesia Sonic-3 (Katie)	Nova-3
GPT-5.4 + Sonic-3	OpenAI GPT-5.4	Cartesia Sonic-3 (Katie)	Nova-3
GPT-5.4 + Kiefer	OpenAI GPT-5.4	Cartesia Sonic-3 (Kiefer, Male)	Nova-3

Ultra-Fast — Lowest Latency (~1.1s)

Preset	LLM	TTS	STT
GPT-4o Mini + Sonic-3	OpenAI GPT-4o Mini	Cartesia Sonic-3	Nova-3
GPT-5.4 Nano + Sonic-3	OpenAI GPT-5.4 Nano	Cartesia Sonic-3	Nova-3
Haiku 4.5 + Sonic-3	Anthropic Claude Haiku 4.5	Cartesia Sonic-3	Nova-3
Gemini 2.5 Flash + Sonic-3	Google Gemini 2.5 Flash	Cartesia Sonic-3	Nova-3
Nemotron 49B + Sonic-3	NVIDIA Nemotron Super 49B	Cartesia Sonic-3	Nova-3

Flux — English Only, Ultra-Low Latency

Flux uses Deepgram's Flux STT model with eager end-of-turn detection for the absolute fastest response times. English only.

Preset	LLM	TTS
Flux + GPT-4o Mini + Sonic-3	OpenAI GPT-4o Mini	Cartesia Sonic-3
Flux + GPT-5.4 Nano + Sonic-3	OpenAI GPT-5.4 Nano	Cartesia Sonic-3
Flux + GPT-5.4 + Sonic-3	OpenAI GPT-5.4	Cartesia Sonic-3
Flux + Claude 4.6 + Sonic-3	Anthropic Claude 4.6	Cartesia Sonic-3
Flux + Gemini Flash + Sonic-3	Google Gemini 2.5 Flash	Cartesia Sonic-3

Balanced — Quality + Speed

Preset	LLM	TTS
GPT-5 Mini + Sonic-3	OpenAI GPT-5 Mini	Cartesia Sonic-3
GPT-4.1 Mini + Sonic-3	OpenAI GPT-4.1 Mini	Cartesia Sonic-3
Haiku 4.5 + Tessa	Anthropic Haiku 4.5	Cartesia Sonic-3 (Tessa)
Gemini 3.0 Flash + Sonic-3	Google Gemini 3.0 Flash	Cartesia Sonic-3

Experimental — Deepgram Aura-2 TTS (Language-Specific)

Preset	LLM	TTS Voice
GPT-5.4 + Julius (DE)	OpenAI GPT-5.4	Aura-2 Julius (German, Male)
GPT-5.4 + Zeus (EN)	OpenAI GPT-5.4	Aura-2 Zeus (English, Male)
Claude 4.6 + Thalia (EN)	Anthropic Claude 4.6	Aura-2 Thalia (English, Female)
GPT-5.4 + Agathe (FR)	OpenAI GPT-5.4	Aura-2 Agathe (French, Female)
GPT-5.4 + Celeste (ES)	OpenAI GPT-5.4	Aura-2 Celeste (Spanish, Female)

Full BYO — Bring Your Own LLM & TTS Keys

In Full BYO mode, Deepgram handles only STT (Nova-3). Your own API keys for LLM and TTS providers are used directly.

Preset	LLM (BYO Key)	TTS (BYO Key)
GPT-5.4 + ElevenLabs	OpenAI GPT-5.4	ElevenLabs Turbo v2.5
GPT-5.4 + OpenAI TTS	OpenAI GPT-5.4	OpenAI TTS-1
GPT-5.4 Nano + ElevenLabs	OpenAI GPT-5.4 Nano	ElevenLabs Turbo v2.5
Gemini 3 Pro + ElevenLabs	Google Gemini 3 Pro	ElevenLabs Turbo v2.5
Gemini Flash + OpenAI TTS	Google Gemini 2.5 Flash	OpenAI TTS-1
Claude 4.6 + ElevenLabs	Anthropic Claude 4.6	ElevenLabs Turbo v2.5
Claude 4.6 + OpenAI TTS	Anthropic Claude 4.6	OpenAI TTS-1
Grok 3 Mini + ElevenLabs	xAI Grok 3 Mini	ElevenLabs Turbo v2.5

Preset Lock & Unlock

When a preset is active, all configuration fields are locked to the preset values (indicated by a lock icon). This prevents accidental changes. To override individual settings, tap Unlock for manual editing. Changing any setting manually switches the preset to “Manual Configuration”.

Manual Configuration

Tap the gear icon next to the Start button to open the configuration panel. All fields below are available:

LLM Provider

Provider	Key Models
OpenAI	GPT-4o Mini, GPT-4.1 Nano/Mini/Full, GPT-5 Nano/Mini/Full, GPT-5.1–5.4 (incl. Nano, Mini)
Anthropic	Claude Haiku 4.5, Sonnet 4, Sonnet 4.5, Sonnet 4.6
Google	Gemini 2.5 Flash/Flash Lite, Gemini 3.0 Flash/Pro, Gemini 3.1 Flash Lite
NVIDIA	Llama Nemotron Super 49B, Nemotron 3 Nano 30B
xAI	Grok 3, Grok 3 Mini, Grok 3 Fast
Groq	GPT OSS 20B

TTS Provider

Provider	Voices	Languages	Key Required
Cartesia Sonic-3	9 voices (Katie, Kiefer, Tessa, Kyle, Leo, Jace, Gavin, Maya, Default)	42 languages (multilingual auto-detect)	Deepgram key only (managed)
Deepgram Aura-2	35+ voices (EN, DE, FR, ES, IT, NL, JA)	Language-specific per voice	Deepgram key only (managed)
ElevenLabs	Your ElevenLabs voices (auto-loaded)	Multilingual	ElevenLabs API key (BYO)
OpenAI TTS	10 voices (Alloy, Ash, Ballad, Coral, Echo, Fable, Nova, Onyx, Sage, Shimmer)	English	OpenAI API key (BYO)

STT Model

Model	Languages	Use Case
Nova-3	Multilingual	Standard, best overall accuracy
Nova-3 General	Multilingual	General-purpose variant
Nova-3 Medical	Multilingual	Medical terminology optimized
Flux	English only	Ultra-low-latency turn detection

Other Settings

Language — Auto-Detect (Multilingual) or a specific language: English, German, French, Spanish, Italian, Dutch, Japanese, Portuguese, Hindi, Russian
Greeting Message — Text the agent speaks when the conversation starts (optional)
System Instruction — Define the AI’s personality and behavior. A base instruction is always included that prevents markdown formatting and follow-up questions in speech output.

Advanced Settings

Expand the Advanced section for fine-tuning:

Temperature (0.00 – 2.00) — Controls response creativity. Default: 0.7. Lower = more focused, higher = more creative.
STT Model — Switch between Nova-3 variants and Flux.

When Flux STT is selected, additional controls appear:

Eager EOT Threshold (0.0 – 1.0) — How aggressively the system detects end-of-turn. Higher = faster response but may cut you off mid-sentence.
EOT Timeout (0 – 5000ms) — Maximum silence before the agent responds.

For ElevenLabs BYO: A custom Voice ID field lets you enter any ElevenLabs voice ID directly.
For OpenAI TTS BYO: Select from 10 OpenAI voices (Alloy, Ash, Ballad, Coral, Echo, Fable, Nova, Onyx, Sage, Shimmer).

During a Conversation

Status indicator — Shows Ready, Connecting, Live (with elapsed time), or Error
Audio level meter — Displays microphone input with Listening/Silent state
Thinking indicator — A green badge appears while the LLM processes your input
Conversation transcript — Real-time display of all dialogue. Your messages appear on the right (green), the agent’s on the left (blue).
Barge-in — Interrupt the AI at any time by speaking. The agent stops immediately and listens to you.
Resize handle — Drag the handle below the transcript to resize the chat area (120px to 85% of screen)
Dual Start/Stop buttons — One at the top, one sticky at the bottom for easy access while scrolling

Latency Monitoring

A compact latency bar appears after the first turn, showing three key metrics:

LLM — Time from your speech to the first LLM token
TTFB — Total Time to First Byte (end-to-end)
TURN — Full turn duration including audio playback

Values are color-coded: green (< 2s), yellow (2–5s), red (> 5s).

Tap the latency bar to expand a detailed per-turn table with columns: #, Speech duration, LLM time, TTS time, TTFB, Audio length, Total. Average LLM and TTFB are displayed in the header.

Echo Cancellation (AEC)

Talk to me includes proprietary Acoustic Echo Cancellation via a native Android Kotlin bridge. The AI’s speaker output is captured and subtracted from your microphone input in real time, preventing self-triggering feedback loops. This allows full hands-free operation on speaker without headphones. Works on all managed presets and most BYO configurations.

Key Pool — Deepgram Voice Agent

The Deepgram Voice Agent Key Pool is a dedicated, collapsible section below the chat area. It manages:

Deepgram API Keys (required) — for STT and managed LLM/TTS routing
LLM Keys (optional, Full BYO only) — OpenAI, Anthropic, Gemini, xAI
TTS Keys (optional, Full BYO only) — ElevenLabs, OpenAI TTS

Each key card shows a 4-row layout: label, provider badge + masked key, trust score with statistics, and Test/Pause action buttons. You can test individual keys or all keys at once.

Session Limits

Sessions are limited to 15 minutes maximum (API constraint). The elapsed time is shown in the Stop button. The session ends automatically when the limit is reached.

Tips

Start with a managed preset (Top Tier or Ultra-Fast) — they require only a Deepgram key and offer the best experience.
GPT-5.4 Nano + Cartesia Sonic-3 delivers ~1.1s response times — the fastest option.
Flux presets are English-only but extremely fast due to eager end-of-turn detection.
Full BYO presets use your own LLM/TTS keys for maximum control but may have reduced barge-in performance with some TTS providers.
All settings take effect on the next session start, not during a live session.

12b. Gemini 3.1 Flash Live

Gemini 3.1 Flash Live provides real-time voice conversations powered by Google’s latest audio AI model. It delivers the speed and natural rhythm needed for voice-first interaction, with sub-second latency, 30 expressive voices, and native multimodal understanding.

Requirements

You need a Google Gemini API key (paid tier recommended) added to the LLM Key Pool in Settings. The key is automatically available for AI Voice Chat. The model used is gemini-3.1-flash-live-preview.

Starting a Conversation

Navigate to the AI Chat tab, then select the Gemini sub-tab. Tap Start Conversation. The app connects to Gemini via WebSocket, opens your microphone, and begins listening. Speak naturally — Gemini responds in real-time audio. Tap End to stop.

Voices (30 Options)

Choose from 30 natural AI voices, each with a distinct personality:

Voice	Character	Best For
Sulafat	Warm	Storytelling, bedtime stories, calm conversations
Gacrux	Mature	Authoritative narration, mentoring, deep discussions
Algenib	Gravelly	Cinematic narration, dramatic reading, character voice
Kore	Firm	Professional briefings, news reading, factual Q&A
Puck	Upbeat	Energetic conversations, motivation, brainstorming
Zephyr	Bright	Optimistic chats, friendly assistance, greetings
Charon	Informative	Tutorials, documentary-style explanations
Fenrir	Excitable	Enthusiastic reactions, game commentary, hype
Leda	Youthful	Casual chat, Gen-Z conversations, trendy topics
Aoede	Breezy	Relaxed conversations, travel talk, lifestyle
Achernar	Soft	Meditation guidance, ASMR-style, gentle encouragement
Algieba	Smooth	Podcast hosting, audiobooks, long-form reading
Despina	Smooth	Elegant narration, luxury brand voice
Achird	Friendly	Customer support, everyday assistance, welcoming tone
Vindemiatrix	Gentle	Supportive conversations, therapy-like tone, empathy
Sadaltager	Knowledgeable	Technical explanations, expert Q&A, encyclopedic
Rasalgethi	Informative	Science documentaries, educational content
Schedar	Even	Balanced discussions, neutral reporting, debates
Alnilam	Firm	Commanding presence, leadership, formal settings
Pulcherrima	Forward	Assertive communication, pitches, presentations
Zubenelgenubi	Casual	Laid-back chat, friends catching up, humor
Sadachbia	Lively	Animated storytelling, children’s content, playful
Laomedeia	Upbeat	Morning shows, cheerful updates, positive vibes
Callirrhoe	Easy-going	Casual advice, lifestyle coaching, approachable
Autonoe	Bright	Creative sessions, idea generation, art discussions
Enceladus	Breathy	Intimate narration, poetry reading, atmospheric
Iapetus	Clear	Precise instructions, step-by-step guides, clarity
Erinome	Clear	Clean communication, corporate training, diction
Umbriel	Easy-going	Relaxed Q&A, weekend vibes, mellow conversations

Tip: Preview all voices in the Google AI Studio Voice Library.

Language

Select from 24 supported languages or leave on Auto-detect. Gemini responds in the language you speak — or in the language you select. Supported: English, German, French, Spanish, Italian, Portuguese, Dutch, Polish, Romanian, Russian, Ukrainian, Turkish, Arabic, Hindi, Bengali, Tamil, Telugu, Marathi, Japanese, Korean, Thai, Vietnamese, Indonesian.

Persona Presets

Persona presets define how Gemini behaves — its personality, tone, and communication style. Choose from six presets or create your own:

Preset	Behavior
Friendly Assistant	Warm, conversational, approachable — great for everyday use
Professional	Clear, concise, authoritative — for business and work
Enthusiastic	Energetic, positive, encouraging — for brainstorming and motivation
Calm & Soothing	Slow, gentle, patient — for relaxation and guided sessions
Teacher	Patient, step-by-step, uses analogies — for learning and explanations
Creative	Imaginative, expressive, vivid language — for storytelling and art
Custom	Write your own system instruction from scratch

System Instruction

The System Instruction is a text briefing you give to Gemini before the conversation starts. Think of it as directing an actor: tell the AI who it is, how to behave, and what to focus on.

Examples:

“You are a patient Italian language tutor. Speak slowly. Correct my grammar gently.”
“You are a senior software architect. Answer concisely and technically.”
“You are a creative storyteller. Speak with flair. Use vivid language.”

When using a Persona Preset, your custom text is appended to the preset instruction. In Custom mode, your text is the entire instruction. Write in English for best results. Settings are saved automatically.

Thinking Depth

Control how deeply Gemini reasons before responding:

Level	Behavior
Minimal	Fastest responses, minimal internal reasoning (default)
Low	Brief consideration, good balance
Medium	Thoughtful responses, longer pause before answering
High	Deep reasoning, best for complex questions

Temperature & Top-P

Temperature (0.0 – 2.0) controls how creative vs. predictable the AI responds:

Range	Behavior	Best For
0.0 – 0.5	Focused, deterministic	Facts, technical answers, precise instructions
0.7 – 1.0	Balanced, natural (default: 1.0)	Most conversations, everyday use
1.2 – 2.0	Creative, surprising	Brainstorming, storytelling, creative writing

Top-P (0.0 – 1.0) limits the pool of words the AI considers. At 0.95 (default), the model picks from the top 95% most likely words. Lower values make output more conservative.

Voice Activity Detection (VAD)

VAD settings control how Gemini detects when you start and stop speaking:

Speech Start Sensitivity — How easily the system detects speech onset.
Speech End Sensitivity — How quickly the system decides you’ve stopped talking.
Silence Duration — How many milliseconds of silence before your turn is considered complete (100–2000ms).

Echo Cancellation (AEC)

Identical to the Deepgram Voice Agent, Gemini 3.1 Flash Live benefits from Talk to me’s proprietary acoustic echo cancellation via the native Android Kotlin bridge. Full hands-free speaker mode works without headphones.

Tips for Best Results

Speak naturally — Gemini supports natural barge-in (interrupt anytime)
On Android, the built-in AEC eliminates echo — no headphones needed
Session length is limited to 15 minutes per connection (API limit)
All settings take effect on the next session start (not during a live session)
The audio level meter shows a colored gradient (green, yellow, orange, red) indicating your microphone input level
Transcription of your speech and Gemini’s responses can be toggled on/off independently

13. Mini-Player Windows

The Mini-Player is a compact Always-on-Top window that provides essential dictation controls without occupying your full screen.

Entering Mini-Player Mode

Click the Collapse button (↗ icon) in the header. The app window shrinks to a compact overlay positioned at the bottom center of your screen.

Mini-Player Layout

The Mini-Player displays a 3×3 grid of essential controls:

Row 1: Speech Input selector, Status/Start button, Text Output selector
Row 2: Voice Translate toggle, Inline Pill (spectrum analyzer), Save Recordings
Row 3: Pipeline timing TDFs, Result preview

DPI-Aware Sizing

The Mini-Player automatically adjusts its size based on your display's DPI scaling, ensuring consistent visual dimensions across monitors with different resolutions (100%, 125%, 150%).

Exiting Mini-Player Mode

Click the Expand button to return to the full-size window at its previous position and size.

14. Global Hotkeys Windows

Talk to me registers system-wide hotkeys so you can control dictation without switching to the app window.

Primary Hotkeys

Hotkey	Action
Ctrl+Win	Start / Stop Recording (global, works from any app)
Ctrl+Win (while processing)	Cancel current pipeline

TTS Hotkey

When text is selected in any application, the TTS hotkey reads it aloud using your configured TTS provider.

Low-Level Hook

The global hotkey uses a Windows low-level keyboard hook, which means it works even when the app is minimized or another application has focus. The hook operates in "zero-swallow mode" — it intercepts the key combination without blocking other keyboard input.

15. Auto-Read Windows

Auto-Read is a Windows-exclusive feature that extracts text from the currently focused application and reads it aloud via TTS.

How It Works

Enable Auto-Read by clicking the Auto-Read button.
Select text in any application (or use Ctrl+C to copy).
Talk to me detects the clipboard content and automatically reads it aloud using your TTS configuration.

Use Cases

Read emails, articles, or documents without staring at the screen.
Review your own writing by hearing it spoken back.
Accessibility support for vision-impaired users.

16. Notification Listener Windows

The Notification Listener captures Windows toast notifications and reads them aloud via TTS.

Requirements

Windows Desktop version
Notification access permission granted in Windows Settings

How It Works

Enable Notification Listener by clicking the toggle.
Grant notification access when prompted by Windows.
When a Windows toast notification arrives (email, chat message, calendar reminder), Talk to me extracts the notification title and body, and reads it aloud using your TTS configuration.

Configuration

Enable/disable in Settings → Hands-Free
TTS voice and provider follow your global TTS settings

17. MP3 Recording & Save Windows

Record TTS Readings

When enabled, every TTS synthesis is automatically saved as an MP3 file with sequential numbering (e.g., recording_001.mp3, recording_002.mp3).

Save Recordings

Click Save Recordings to open the folder containing all recorded MP3 files. You can configure the recording directory in Settings.

A Note on Android Permissions Android

The Android version of Talk to me requires several system permissions (Microphone, Overlay, Accessibility Service, Notification Listener) — each with its own confirmation dialog. We understand that this can feel cumbersome.

We would have preferred a simpler setup experience. However, Google Play Store policies and Android security guidelines require that each sensitive permission is requested individually, with a clear disclosure explaining what the permission is used for and what it is not used for. These multi-step confirmation flows are not our design choice — they are mandated by platform compliance requirements.

Each permission is requested only when you actually need the feature, not all at once during installation. You can revoke any permission at any time through Android Settings. The app will continue to work — the corresponding feature will simply be disabled.

Here is a summary of all Android permissions and why they are needed:

Permission	Feature	Required?
Microphone	Speech-to-Text dictation, AI Voice Chat	Yes — core feature
Draw over other apps	Floating Bubble (hands-free overlay)	Only if you use the overlay
Accessibility Service	Auto-Paste text into chat app input fields	Only if you use Auto-Paste
Notification Listener	Auto-Read incoming messages aloud	Only if you use Auto-Read
Internet	Communication with AI providers	Yes — required for all features

Thank you for your understanding. We take your privacy seriously — none of these permissions are used to collect, store, or transmit personal data. See Privacy and Security for full details.

18. Floating Bubble (Overlay) Android

The Floating Bubble is a small circular icon that floats on top of all other apps, providing hands-free dictation access without switching apps.

Activating the Overlay

Tap the Overlay button in the main app.
If Android's "Draw over other apps" permission is not yet granted, you will be directed to enable it.
A small Talk to me bubble appears on screen.

Using the Bubble

Single Tap: Start or stop recording. Red pulsing border during recording, blue pulsing border during TTS readout.
Triple Tap: Test readback — reads a predefined text to confirm TTS works.
Long Press: Clears the unread message queue.
Drag: Move the bubble anywhere on screen.

During Recording via Bubble

Tap the bubble to start recording.
After transcription, a "✓ Inserted!" toast confirms the text was pasted or placed in clipboard.

Bubble Translation and Auto-Insert

The Bubble uses the same translation logic as the main window: if your input and output languages differ, your dictation is automatically translated before being inserted. Voice Translate (text-to-speech readout) also works in the Bubble.

Using Android's Accessibility Service, the Bubble inserts the (possibly translated) text directly into the focused input field. In all mainstream apps we tested — including WhatsApp, Gmail, Discord, Microsoft Teams, Viber, Chrome, ChatGPT, Facebook, Instagram, Pinterest, and Skool — auto-insert works reliably.

If you use a very exotic app where auto-insert fails, the already translated text is guaranteed to be in the clipboard — a long press on the input field and "Paste" makes the text visible.

Stopping the Overlay

Tap the Overlay button again or tap Stop on the notification.

19. Auto-Paste Android

Auto-Paste uses Android's Accessibility Service to automatically insert dictated text into the currently focused text field.

Enabling Auto-Paste

Tap the Auto-Paste button.
A disclosure dialog explains what the Accessibility Service does and does not do. Tap Enable Auto-Paste.
You are directed to Android's Accessibility Settings. Find Talk to me and enable it.
The button now shows ✓ with a cyan border.

Accessibility Shortcut Button

When enabling the Accessibility Service, Android will ask you to choose an activation shortcut. This determines how you can quickly toggle the service on/off:

Accessibility button (recommended): A small button appears in the navigation bar. Tap it to toggle the service.
Volume Up + Volume Down (hold 3 seconds): Press and hold both volume keys simultaneously for 3 seconds to toggle.

We recommend the Accessibility button option for the easiest experience. This is a standard Android system feature — the choice does not affect how Auto-Paste works.

Important Notes

Requires Android Accessibility permission (a sensitive permission).
May need to be re-granted after app updates.
Used exclusively for text insertion — no other accessibility data is accessed.

App Compatibility

Auto-Paste works reliably in most Android apps. The following apps were tested with v0.5.159:

App	Auto-Paste	Translation
WhatsApp	✅	✅
Gmail (recipient + body)	✅	✅
Discord	✅	✅
Microsoft Teams	✅	✅
Viber	✅	✅
Chrome	✅	✅
ChatGPT	✅	✅
Facebook	✅	✅
Instagram	✅	✅
Pinterest	✅	✅
Skool (WebView in Chrome)	✅	✅
Viber	✅	✅

"App Access Denied" — Restricted Settings (Android 13+)

On some devices, when enabling Auto-Paste or Notification Access, you may see "App access denied" or "For your security, this setting is currently unavailable." This is not a bug — it is an Android 13+ security feature called Restricted Settings.

Affected manufacturers: Lenovo (ZUI), Samsung (One UI), Xiaomi/Redmi (MIUI/HyperOS), OPPO/Realme (ColorOS), Huawei/Honor (EMUI/HarmonyOS), OnePlus (OxygenOS), Stock Android/Pixel.

How to fix:

Open Android Settings → Apps → See all apps → find Talk to me.
Tap Talk to me to open the App Info page (not the Notifications sub-page).
Tap the three-dot menu (⋮) in the top-right corner.
Select Allow restricted settings.
Confirm with your PIN/fingerprint.
Go back to Settings → Accessibility and enable Talk to me.

Tip: If the three-dot menu is not visible, first try to enable the permission (triggering the error), then go to the App Info page — the menu should now appear.

Xiaomi/MIUI/HyperOS: Go to Settings → Apps → Manage apps → Talk to me and scroll to the bottom.

Lenovo (ZUI): When tapping Apps in Settings, you may land on the Notifications sub-page instead of App Info. Navigate back and look for the full App Info page with storage, permissions, and battery sections.

20. Auto-Read Messages Android

Auto-Read automatically reads incoming chat messages aloud using TTS — ideal for driving, cooking, or exercising.

How It Works

Enable Auto-Read (Headphones icon).
Ensure Notification Access is granted.
The Overlay must be active.
When a message arrives from an allowed app, Talk to me announces the sender and reads the message aloud.

Pre-Selected Chat Apps

WhatsApp, WhatsApp Business, Telegram, Signal, Discord, Slack, Microsoft Teams, Viber, Messenger (Meta), Instagram, Google Messages, Samsung Messages.

You can add or remove apps in Auto-Read Apps Configuration.

21. Notification Access Android

Notification Access allows Talk to me to read incoming notifications, required for Auto-Read Messages.

Granting Access

Tap the Notif Access button.
Go to Android's Notification Listener Settings.
Find Talk to me and enable it.
The button shows ✓ with a cyan border.

Important Notes

System-level permission — processes only notifications from explicitly allowed apps.
No notification data is stored, transmitted, or logged.

22. Auto-Read Apps Configuration Android

Control which apps are allowed to have their notifications read aloud.

Known Chat Apps

Pre-selected messaging apps with individual toggles (WhatsApp, Telegram, Signal, Discord, Slack, Teams, Viber, Messenger, Instagram, Google Messages, Samsung Messages).

Search and Add Custom Apps

Tap the search field and type an app name.
Matching installed apps appear, sorted by relevance.
Check the box to add an app.

How Filtering Works

Only notifications from allowed apps are read aloud.
Changes take effect immediately — no restart required.

23. Settings

UI Language

English, Deutsch, Français, Español — independent of your system language.

Quality Preset

Preset	STT Provider	LLM Provider	Model	Polish
Top Performer	Scribe v2	OpenAI	GPT-5.4	Strong
Standard	Scribe v2	OpenAI	GPT-4.1 mini	Strong
Budget	Whisper	Groq	Default	Light
Free	Deepgram	Groq	Default	Off
Custom	Manual	Manual	Manual	Manual

Speech-to-Text

Provider: OpenAI Whisper, Deepgram Nova-2/3, ElevenLabs Scribe v2, Groq Whisper
Custom Keyterms (Scribe only): Proper nouns, brands, technical terms
Language: Auto-Detect or specific

Text-to-Speech

Provider: ElevenLabs, OpenAI TTS, Deepgram Aura 2
Model (ElevenLabs): Eleven v3, Multilingual v2, Flash v2.5, Turbo v2.5

LLM Provider (Polish)

Provider: OpenAI, Groq, Anthropic, Google Gemini, xAI Grok
Model: Provider default or specific
Polish Strength: Light or Strong

Translation Provider

Separate provider for AI-Translation (can differ from Polish provider).

AI-Polish / AI-Translate

Toggle each independently. When AI-Translate is enabled:

Translate To: 20 target languages
Voice Translate: Auto-read translations via TTS

Android Hands-Free

Quick toggles for Overlay, Auto-Read Messages, Auto-Paste, Notification Access.

Save and Test

Save all current settings — Persists changes to device storage
Test current configuration — Tests all configured providers with response times

24. Word Corrections

Word Corrections teach Talk to me the correct spelling of names, brands, and terms that speech recognition gets wrong.

Adding Corrections

Single Add

Enter Wrong spelling and Correct spelling, then tap/click Add.

Bulk Import

Enter the correct spelling, then list wrong variants (one per line). Use Generate with AI to auto-create likely misspellings.

Multi-Import

Enter pairs as wrong;correct (one per line). Supports ;, ->, comma, or tab separators.

How Corrections Work

During post-processing (Pipeline stage 3), wrong spellings are automatically replaced before AI-Polish runs.

25. Backup and Restore

Export Settings

Open Backup & Restore in Settings.
Tap/click Export Settings.
Enter and confirm an Encryption Password (min. 6 characters).
Windows: The save dialog suggests talktome-settings.ttm — you choose the folder.
Android: The backup is written to your Downloads area as TalkToMe-backup.ttm. If that name already exists, the system may add (1), (2), etc. — all are valid encrypted backups.

Import Settings

Tap/click Import Settings.
Automatic (Android): The app looks for the newest matching file named TalkToMe-backup with a .ttm extension (including TalkToMe-backup (1).ttm, etc.) in app storage and in Downloads.
If the system file picker opens: On many phones (e.g. Samsung), the first screen is Recently used and may default to Images — your .ttm files are hidden until you switch the top filter to Documents or This week, or open the Download folder directly.
New device: Copy the .ttm from your old device (USB, cloud, email), then use Import and pick that file.
Enter the encryption password.
All settings are restored and the app restarts.

Technical Details

Encryption: AES-256-GCM with PBKDF2-HMAC-SHA256 (100,000 iterations)
Included: All settings, API keys, word corrections, auto-read apps, quality preset, UI language
NOT included: License activation (tied to Machine ID)

26. Usage Dashboard

Metric	Description
STT Calls	Speech-to-text transcriptions performed
LLM Polish	AI-Polish or AI-Translate operations
TTS Synth	Text-to-speech synthesis operations

Counters are cumulative since the last settings reset.

27. Troubleshooting

General

Problem	Solution
"No API key configured"	Add a key in Key Pool for the feature you need
Recording doesn't start	Check microphone permission in system settings
Voice Translate produces no audio	Ensure a TTS API key is configured and working
Export fails	Check write access to Downloads folder
Can't see backup in Import file picker	Switch from Images to Documents / This week, or open the Download folder — see §25 Import

Windows Windows-Specific

Problem	Solution
Ctrl+Win hotkey doesn't work	Ensure the app is running (check system tray)
Text not pasted after dictation	Ensure the target window supports Ctrl+V
Notification Listener unavailable	Available on Windows Desktop — ensure notification access is granted in Windows Settings
Mini-Player looks too large/small	DPI-aware sizing adjusts automatically; restart the app if display settings changed

Android Android-Specific

Problem	Solution
Auto-Read doesn't work	Ensure Overlay is active, Auto-Read enabled, and Notification Access granted
Auto-Paste doesn't work	Re-enable Accessibility Service in Android Settings
Bubble doesn't appear	Grant "Draw over other apps" permission
"App access denied" when granting permissions	Restricted Settings (Android 13+) — see §19 "Restricted Settings" for the step-by-step solution
Screen doesn't rotate (Tablet)	Check if PC Mode is active (pull down Quick Settings). Auto-Rotate is ignored in PC Mode — switch back to Android Mode. Primarily affects Lenovo tablets (ZUI).

28. Privacy and Security

Data Handling

No data collection: Talk to me does not collect, store, or transmit any user data to mrocon GmbH servers.
Direct API communication: Audio and text go directly from your device to your chosen AI provider.
Local storage only: All settings and API keys are stored exclusively on your device.
No analytics: No tracking, analytics, or telemetry of any kind.

Permissions

Windows

Permission	Purpose
Microphone	Record audio for dictation
Notification Access	Read notifications
Internet	Communicate with AI providers

Android

Permission	Purpose
Microphone	Record audio for dictation
Overlay (Draw over apps)	Display the floating bubble
Notification Listener	Read notifications for Auto-Read
Accessibility Service	Auto-Paste text into fields
Internet	Communicate with AI providers
Query Installed Packages	Show app names in Auto-Read settings

Encryption

Windows: API keys encrypted with DPAPI (Windows Data Protection API)
Android: API keys in app-private internal storage
Backup files: AES-256-GCM encryption

Appendix A — Supported Languages

Speech Input Languages

Auto-Detect, German, English, French, Spanish, Italian, Portuguese, Dutch, Japanese, Korean, Chinese, Russian, Arabic, Hindi, Polish, Turkish, Swedish, Ukrainian

Translation Target Languages

German, English, French, Spanish, Italian, Portuguese, Dutch, Japanese, Chinese, Korean, Russian, Arabic, Hindi, Polish, Turkish, Swedish, Ukrainian, Danish, Finnish, Norwegian

TTS Languages

Auto, German, English, French, Italian, Spanish, Portuguese, Dutch, Polish, Swedish, Danish, Finnish, Norwegian, Turkish, Japanese, Korean, Chinese

UI Languages

English, Deutsch, Français, Español

Appendix B — Supported Providers

Speech-to-Text

Provider	Notes
OpenAI Whisper	Most widely used, reliable
Deepgram Nova-2 / Nova-3	Fast, good accuracy
ElevenLabs Scribe v2	Supports custom keyterms
Groq Whisper	Free tier available, fast

LLM (Polish / Translation)

Provider	Notes
OpenAI	GPT-4o-mini, GPT-5.4, etc.
Groq	Free tier, Llama models
Anthropic	Claude models
Google Gemini	Gemini models
xAI Grok	Free tier available

Text-to-Speech

Provider	Notes
ElevenLabs	Best quality, voice cloning, 4 models
OpenAI TTS	6 built-in voices, simple
Deepgram Aura 2	Fast synthesis

Appendix C — Quality Presets

Preset	STT	LLM	Model	Polish	Cost
Top Performer	Scribe v2	OpenAI	GPT-5.4	Strong	$$$
Standard	Scribe v2	OpenAI	GPT-4.1 mini	Strong	$$
Budget	Whisper	Groq	Default	Light	$
Free	Deepgram	Groq	Default	Off	Free
Custom	Manual	Manual	Manual	Manual	Varies

Appendix D — Keyboard Shortcuts Windows

Shortcut	Action
Ctrl+Win	Start / Stop Recording
Ctrl+Win (during processing)	Cancel Pipeline
TTS Hotkey	Read selected text aloud

For support, contact team@talktome.studio or visit talktome.studio.

↑ Back to top

Talk to me — User Manual

1. Introduction

Key Features

Platform Highlights

Security Principles

2. Getting Started

Windows Installation — Windows Desktop

Android Installation — Android

First Launch

Android Quick Start — Your First 5 Minutes

Step 1 — Enable Microphone Access

Step 2 — Add Your API Keys

Step 3 — Optional Features (Cockpit Buttons)

Free & Paid Tier Overview

Tier 1 — Completely Free (no money, no credit card)

Tier 2 — Free with More Power (additional free keys)

Tier 3 — Premium Quality (paid keys)

API Key Cost Overview

Recommended Start (3 minutes, $0 cost)

Feature Availability by Tier

3. License Activation

The License Gate

Activating a License

The Free Trial

License Modal

4. App Overview

Navigation

Interface Layout

Action Buttons

The Info Button

5. Speech-to-Text

Recording a Dictation

What Happens After Recording

The Result Area

Recording Signals (Audio Cues)

Acoustic Signals

Visual Signals

Note: Start Beep on Speakerphones

6. Text-to-Speech

Basic Usage

Playback Controls

Provider and Voice Selection

Model Selection (ElevenLabs)

Audio Quality

Text Normalization

Voice Fine-Tuning (ElevenLabs)

Additional Options

7. The Pipeline

Pipeline Stages

TDF (Text Display Field) Indicators

Timing Display

8. Voice Translate

How It Works

Examples

Configuration

Use Cases

9. AI Polish & Translation

AI-Polish

AI-Translate

10. Quick-Override Controls

Speech Input Override

Text Output Override

Reset to Settings

11. Key Pool

Categories

Adding a Key

Key Slot Features

Trust System

12. AI Voice Chat

12a. Deepgram Voice Agent

Getting Started

Configuration Presets (32+ Options)

Top Tier — Best Quality

Ultra-Fast — Lowest Latency (~1.1s)

Flux — English Only, Ultra-Low Latency

Balanced — Quality + Speed

Experimental — Deepgram Aura-2 TTS (Language-Specific)

Full BYO — Bring Your Own LLM & TTS Keys

Preset Lock & Unlock

Manual Configuration