Talk to me — User Manual
This manual covers both the Windows Desktop and Android Hands-Free editions of Talk to me. Sections marked with Windows or Android apply only to that platform. All other sections apply to both.
1. Introduction
Talk to me is a professional dictation, translation, and voice interaction studio available for Windows Desktop and Android. It converts your speech into text, polishes it with AI, translates it into 20+ languages, and reads it back to you — all in real time.
The app follows a strict BYOK (Bring Your Own Key) and Zero-Knowledge / Zero-Trust architecture: your API keys and data never leave your device.
Key Features
- Real-time Dictation: Record your voice and get polished text in seconds.
- AI-Polish: Automatic grammar correction and filler word removal powered by your choice of AI provider.
- Live Translation: Translate dictated text into 20+ languages on the fly.
- Voice Translate (Speech-to-Speech): Your translated text is automatically read aloud in the target language.
- Text-to-Speech: Convert any text into natural-sounding speech with ElevenLabs, OpenAI TTS, or Deepgram.
- Live Language Immersion: Speak in your native language, instantly see and hear it in the language you want to master.
- Word Corrections: Teach the app your names, brands, and terms that speech recognition gets wrong.
- Encrypted Backup: Export all settings and API keys as a password-protected encrypted file.
- Multi-Provider Support: Choose from OpenAI, Groq, Anthropic, Google Gemini, xAI Grok, ElevenLabs, Deepgram, and more.
Platform Highlights
| Feature | Windows Desktop | Android Hands-Free |
|---|---|---|
| Mini-Player (compact mode) | ✓ | — |
| Global Hotkeys (Ctrl+Win) | ✓ | — |
| Auto-Read (Ctrl+C text extraction) | ✓ | — |
| Notification Listener (Full Edition) | ✓ | — |
| MP3 Recording & Save | ✓ | — |
| Floating Pill (Spectrum Analyzer) | ✓ | — |
| Floating Bubble (Overlay) | — | ✓ |
| Auto-Paste (Accessibility) | — | ✓ |
| Auto-Read Messages (from chat apps) | — | ✓ |
| App-level Notification Access | — | ✓ |
Security Principles
- Zero-Knowledge: Talk to me never stores, transmits, or has access to your API keys on any server. All keys are stored locally on your device.
- Zero-Trust: The app never phones home. No analytics, no tracking, no telemetry. Your dictation data flows directly from your device to your chosen AI provider and nowhere else.
- BYOK: You bring your own API keys from the providers you trust. Talk to me does not resell API access.
2. Getting Started
Windows Installation — Windows Desktop
Talk to me for Windows is available as an EV-signed installer from talktome.studio or via the Microsoft Store.
System Requirements:
- Windows 10 or later (64-bit)
- An active internet connection
- At least one API key from a supported provider
The installer is digitally signed with an Extended Validation (EV) certificate from Certum (mrocon GmbH). Windows SmartScreen will not show any warnings.
Android Installation — Android
Talk to me for Android is available as an APK from talktome.studio or via the Google Play Store.
System Requirements:
- Android 8.0 or later
- An active internet connection
- At least one API key from a supported provider
First Launch
When you open Talk to me for the first time, you will see the License Gate. You have two options:
- Enter a License Key to unlock the full app immediately.
- Start a 7-Day Free Trial to explore all features without a license key.
After activation or trial start, the app loads and you can begin using it right away — provided you have at least one API key configured (see Key Pool).
3. License Activation
The License Gate
On first launch (or after trial expiration), the License Gate is displayed. It shows:
- The Talk to me wordmark
- A text field for your license key (format:
TTM-XXXX-XXXX-XXXX-XXXX) - Your Machine ID (a unique device identifier, needed for activation)
- An Activate button
- A Start 7-Day Free Trial button (if no trial has been used)
- Links to Buy a License and the Customer Portal
Activating a License
- Enter your license key in the text field.
- Tap/click Activate.
- The app verifies your key online and activates it for this device.
- Once activated, you will not see the License Gate again unless you deactivate or your license expires.
The Free Trial
- Tap/click Start 7-Day Free Trial to unlock all features for 7 days.
- A banner at the top of the app shows how many trial days remain.
- After 7 days, the trial expires and the License Gate reappears.
License Modal
Once inside the app, you can view your license status by clicking the License button (shield icon). The License Modal shows:
- Status: Active, Trial, Grace Period, or Expired
- Product: Your license product name
- Plan: Yearly or Lifetime
- Expires: Expiration date (or "Lifetime")
- Devices: Number of active devices / maximum allowed
- Key: Your license key (partially masked)
- Machine ID: Your device's unique identifier
From this modal you can:
- Deactivate Device — releases the license from this device so you can use it on another
- Close — return to the app
4. App Overview
The app is organized into two main tabs and several supporting sections:
Navigation
At the top of the screen, two tabs let you switch between the app's primary modes:
- Speech-to-Text — Record your voice and get polished, translated text
- Text-to-Speech — Convert written text into spoken audio
Interface Layout
Below the tabs, the main interface is arranged vertically:
- Quick-Override Controls — Language selectors for input and output
- Action Buttons — Quick access to platform features
- Status Indicator — Shows the current state (Ready, Recording, Transcribing, etc.)
- Pipeline Display — Visual progress of your dictation through the processing stages
- Result Area — Your transcribed/translated text
- TTS Panel (Text-to-Speech tab only) — Text input and playback controls
- Key Pool — Manage your API keys
- Settings — All configuration options
Action Buttons
Windows Desktop action buttons:
- Voice Translate — Toggle speech-to-speech translation
- Notification Listener — Toggle notification readout (Full Edition)
- Auto-Read — Toggle Ctrl+C text-to-speech
- Record TTS Readings — Toggle MP3 recording of TTS output
- Save Recordings — Open recordings folder
Android action buttons:
- License — Open license modal
- Voice Translate — Toggle speech-to-speech translation
- Overlay — Start/stop the Floating Bubble
- Auto-Paste — Open Accessibility settings
- Auto-Read — Toggle auto-read messages
- Notif Access — Open notification listener settings
The Info Button
In the header, the Info button opens the App Info modal, which displays:
- A link to talktome.studio
- The support email (tap/click to copy)
- The current app version
- Number of detected microphones
5. Speech-to-Text
The Speech-to-Text tab is the primary mode of Talk to me. Here, you record your voice and receive polished, optionally translated text.
Recording a Dictation
- Ensure the status shows Ready — Start Dictation (green).
- Click/tap the large Start Dictation button.
- The button turns red and shows Stop Recording. Speak clearly.
- While recording, you can see: Recording duration in seconds, Audio level meter showing input volume, the currently active STT provider and language.
- Click/tap the button again to Stop Recording.
Windows You can also start/stop recording using the global hotkey Ctrl+Win (no need to focus the app window).
What Happens After Recording
After you stop recording, the app processes your audio through the Pipeline (see The Pipeline):
- Capture — Audio recording is finalized
- STT — Your audio is transcribed by the selected provider
- Post-Processing — The raw text is cleaned up (word corrections applied)
- Polish / Translation — If enabled, AI corrects grammar or translates the text
- Inject — The final text is placed in your clipboard
Windows The text is automatically pasted into the previously focused window via simulated Ctrl+V (Smart Clipboard Injection).
Android If Auto-Paste is enabled, the text is automatically inserted into the active text field via the Accessibility Service.
The Result Area
After processing, your text appears in the result area. A hint confirms the text has been copied to your clipboard and is ready to paste.
6. Text-to-Speech
The Text-to-Speech tab lets you convert any written text into natural-sounding speech.
Basic Usage
- Switch to the Text-to-Speech tab.
- Type or paste text into the text area.
- Click/tap Read Aloud to start playback.
Playback Controls
- Pause — Temporarily stops playback
- Resume — Continues from where you paused
- Stop — Ends playback entirely
- Replay — Plays the same audio again without re-synthesizing
Provider and Voice Selection
- ElevenLabs: Choose from your available voices or use "Default (Brian v3)". Custom Voice-IDs supported.
- OpenAI TTS: Nova, Alloy, Echo, Fable, Onyx, Shimmer
- Deepgram Aura 2: Fast synthesis
Model Selection (ElevenLabs)
| Model | Character Limit | Best For |
|---|---|---|
| Eleven v3 | 5,000 | Highest quality, short content |
| Multilingual v2 | 10,000 | Multi-language support |
| Flash v2.5 | 40,000 | Fast synthesis, long texts |
| Turbo v2.5 | 40,000 | Speed and quality balance |
Audio Quality
| Quality | Description |
|---|---|
| MP3 192 kbps | Creator quality — highest fidelity |
| MP3 128 kbps | Standard — good balance |
| MP3 64 kbps | Compact — smaller file size |
| MP3 32 kbps | Minimal — lowest quality |
Text Normalization
| Setting | Description |
|---|---|
| Auto | The model decides how to handle numbers |
| Always On | Numbers converted to words (e.g., "42" → "forty-two") |
| Off | No normalization applied |
Voice Fine-Tuning (ElevenLabs)
| Slider | Range | Description |
|---|---|---|
| Stability | Variable ↔ Stable | Lower = more expressive; Higher = more consistent |
| Similarity | Creative ↔ Original | How closely the output matches the original voice |
| Style | Neutral ↔ Expressive | Amount of emotional expression |
| Speed | Slow (0.7×) ↔ Fast (1.2×) | Playback speed |
Additional Options
- Code-Filter: Strips code blocks and technical syntax before synthesis.
- Auto-Record: Automatically saves synthesized audio. Tap the folder icon to choose the directory.
- Speaker Boost: Enhances voice clarity (ElevenLabs only).
7. The Pipeline
The Pipeline is Talk to me's core processing engine. It visualizes the stages your audio passes through from recording to final output.
Pipeline Stages
| Stage | Label | Description |
|---|---|---|
| 1 | Capture | Audio recording and finalization |
| 2 | STT | Speech-to-Text transcription |
| 3 | Post | Post-processing (cleanup, word corrections) |
| 4 | Polish or Trans | AI-Polish or AI-Translate |
| 5 | Inject | Text copied to clipboard / auto-pasted |
TDF (Text Display Field) Indicators
Each pipeline stage shows the active provider (e.g., "Scribe v2", "GPT-5.4") and timing information after completion.
Timing Display
After processing, a timing line shows:
STT 1.2s → LLM 0.8s → Inject 0.1s → Total 2.1s
If Voice Translate is active, an additional S2S (Speech-to-Speech) timing is shown.
8. Voice Translate
Voice Translate combines AI-Translation with Text-to-Speech to create a real-time speech-to-speech translation experience.
How It Works
- Enable Voice Translate (purple when active).
- Record a dictation in your source language.
- The app transcribes → translates → reads the translation aloud.
Configuration
- Target Language: Set in Settings → AI-Translate → Translate To
- TTS Voice: Uses your configured TTS provider and voice
- Polish: Automatically disabled when Voice Translate is active
Use Cases
- Travel: Speak in your language, have the translation read aloud.
- Language Learning: Hear how your text sounds in another language.
- Live Language Immersion: Turn your own thoughts into live fluency — speak in your native language and absorb the output in the language you want to master.
9. AI Polish & Translation
AI-Polish
When enabled, AI-Polish corrects grammar, punctuation, and (with "Strong" setting) removes filler words like "um", "uh", "you know", "basically".
Polish Strength:
- Light — Grammar and punctuation correction only
- Strong — Also removes filler words
Status indicators:
- POLISH (cyan) — Active
- OFF — Disabled
- KEY MISSING (yellow) — No LLM key configured
AI-Translate
When enabled, your dictated text is translated into the target language.
Status indicators:
- TRANSLATE (cyan) — Active, showing target language
- VOICE OUTPUT (purple) — Voice Translate also active
- TEXT ONLY — Translation without voice output
- OFF — Disabled
Important: AI-Polish and AI-Translate are mutually exclusive — enabling one disables the other.
10. Quick-Override Controls
The Quick-Override controls allow you to temporarily change the input or output language for a single dictation without modifying your saved settings.
Speech Input Override
Select a different input language for the next recording:
- Auto-Detect — The STT provider detects the language automatically
- Individual languages (see Appendix A)
Text Output Override
Select a different output language (equivalent to temporarily enabling translation):
- Default (same as input) — No translation
- All 20 translation languages
Reset to Settings
When an override is active, a Reset button (↩ icon) appears. Tap/click it to revert to your saved settings.
11. Key Pool
The Key Pool is where you manage your API keys. Talk to me uses a pool-based architecture — you can add multiple keys per category, and the app automatically rotates between them based on trust scores.
Categories
| Category | Purpose | Supported Providers |
|---|---|---|
| Speech-to-Text | Transcription | OpenAI Whisper, Deepgram Nova, ElevenLabs Scribe v2, Groq Whisper |
| AI-Polish / LLM | Grammar, translation | OpenAI, Groq, Anthropic, Google Gemini, xAI Grok |
| Text-to-Speech | Voice synthesis | ElevenLabs, Deepgram, OpenAI TTS |
Adding a Key
- Expand the Key Pool section.
- Click/tap + Add Key in the desired category.
- Select the Provider.
- Enter a Label (e.g., "My OpenAI Key").
- Enter your API Key.
- Click/tap Save Key.
Key Slot Features
Each key slot displays:
- Label and Provider
- Masked Key (last 4 characters visible)
- Trust Score — Color-coded (green/yellow/red)
- Statistics — Calls, successes, failures, rate limits
Actions per slot:
- Test — Verify the key works
- Pause / Activate — Temporarily disable or re-enable
- Remove — Permanently delete
Trust System
| Level | Score | Color | Behavior |
|---|---|---|---|
| Excellent | ≥80% | Green | Preferred |
| Good | ≥60% | Green | Normal |
| OK | ≥40% | Yellow | Fallback |
| Weak | ≥20% | Yellow | Rarely used |
| Critical | <20% | Red | Last resort |
Keys that hit rate limits are placed in automatic cooldown while other keys are used.
12. AI Voice Chat
AI Voice Chat lets you have real-time voice conversations with Google Gemini. Speak naturally, get answered instantly, interrupt freely — just like talking to another person. Powered by the Gemini Live API with sub-second latency.
Headphones strongly recommended (Android / phones)
For AI Voice Chat on phones, use wired or Bluetooth earphones or headphones when you can. Playing AI replies through the built-in speaker can let the microphone pick up the model’s voice (acoustic feedback). That may create false “you” transcripts or confuse turn-taking — even though playback and conversation can still work. Desktop users with a comms headset (good echo cancellation) typically have fewer issues. We are working toward better speaker-only behavior.
Requirements
You need a Google Gemini API key (paid tier recommended) added to the LLM Key Pool in Settings. The key is automatically available for AI Voice Chat.
Starting a Conversation
Navigate to the Gemini Live tab. Tap Start Conversation. The app connects to Gemini via WebSocket, opens your microphone, and begins listening. Speak naturally — Gemini responds in real-time audio. Tap End to stop.
Voices (30 Options)
Choose from 30 natural AI voices, each with a distinct personality:
| Voice | Character | Best For |
|---|---|---|
| Sulafat | Warm | Storytelling, bedtime stories, calm conversations |
| Gacrux | Mature | Authoritative narration, mentoring, deep discussions |
| Algenib | Gravelly | Cinematic narration, dramatic reading, character voice |
| Kore | Firm | Professional briefings, news reading, factual Q&A |
| Puck | Upbeat | Energetic conversations, motivation, brainstorming |
| Zephyr | Bright | Optimistic chats, friendly assistance, greetings |
| Charon | Informative | Tutorials, documentary-style explanations |
| Fenrir | Excitable | Enthusiastic reactions, game commentary, hype |
| Leda | Youthful | Casual chat, Gen-Z conversations, trendy topics |
| Aoede | Breezy | Relaxed conversations, travel talk, lifestyle |
| Achernar | Soft | Meditation guidance, ASMR-style, gentle encouragement |
| Algieba | Smooth | Podcast hosting, audiobooks, long-form reading |
| Despina | Smooth | Elegant narration, luxury brand voice |
| Achird | Friendly | Customer support, everyday assistance, welcoming tone |
| Vindemiatrix | Gentle | Supportive conversations, therapy-like tone, empathy |
| Sadaltager | Knowledgeable | Technical explanations, expert Q&A, encyclopedic |
| Rasalgethi | Informative | Science documentaries, educational content |
| Schedar | Even | Balanced discussions, neutral reporting, debates |
| Alnilam | Firm | Commanding presence, leadership, formal settings |
| Pulcherrima | Forward | Assertive communication, pitches, presentations |
| Zubenelgenubi | Casual | Laid-back chat, friends catching up, humor |
| Sadachbia | Lively | Animated storytelling, children's content, playful |
| Laomedeia | Upbeat | Morning shows, cheerful updates, positive vibes |
| Callirrhoe | Easy-going | Casual advice, lifestyle coaching, approachable |
| Autonoe | Bright | Creative sessions, idea generation, art discussions |
| Enceladus | Breathy | Intimate narration, poetry reading, atmospheric |
| Iapetus | Clear | Precise instructions, step-by-step guides, clarity |
| Erinome | Clear | Clean communication, corporate training, diction |
| Umbriel | Easy-going | Relaxed Q&A, weekend vibes, mellow conversations |
Tip: Preview all voices in the Google AI Studio Voice Library.
Language
Select from 24 supported languages or leave on Auto-detect. Gemini will respond in the language you speak — or in the language you select. Supported: English, German, French, Spanish, Italian, Portuguese, Dutch, Polish, Romanian, Russian, Ukrainian, Turkish, Arabic, Hindi, Bengali, Tamil, Telugu, Marathi, Japanese, Korean, Thai, Vietnamese, Indonesian.
Persona Presets
Persona presets define how Gemini behaves — its personality, tone, and communication style. Choose from six presets or create your own:
| Preset | Behavior |
|---|---|
| Friendly Assistant | Warm, conversational, approachable — great for everyday use |
| Professional | Clear, concise, authoritative — for business and work |
| Enthusiastic | Energetic, positive, encouraging — for brainstorming and motivation |
| Calm & Soothing | Slow, gentle, patient — for relaxation and guided sessions |
| Teacher | Patient, step-by-step, uses analogies — for learning and explanations |
| Creative | Imaginative, expressive, vivid language — for storytelling and art |
| Custom | Write your own system instruction from scratch |
System Instruction
The System Instruction is a text briefing you give to Gemini before the conversation starts. Think of it as directing an actor: tell the AI who it is, how to behave, and what to focus on.
Examples:
- "You are a patient Italian language tutor. Speak slowly. Correct my grammar gently."
- "You are a senior software architect. Answer concisely and technically."
- "You are a creative storyteller. Speak with flair. Use vivid language."
When using a Persona Preset, your custom text is appended to the preset instruction. In Custom mode, your text is the entire instruction. Write in English for best results. Settings are saved automatically when you click outside the text field.
Temperature & Top-P
Temperature (0.0 – 2.0) controls how creative vs. predictable the AI responds:
| Range | Behavior | Best For |
|---|---|---|
| 0.0 – 0.5 | Focused, deterministic, repetitive | Facts, technical answers, precise instructions |
| 0.7 – 1.0 | Balanced, natural (default: 1.0) | Most conversations, everyday use |
| 1.2 – 2.0 | Creative, surprising, unpredictable | Brainstorming, storytelling, creative writing |
Top-P (0.0 – 1.0) limits the pool of words the AI considers. At 0.95 (default), the model picks from the top 95% most likely words, cutting off the improbable "tail". Lower values make output more conservative.
Voice Activity Detection (VAD)
VAD settings control how Gemini detects when you start and stop speaking:
- Speech Start Sensitivity — How easily the system detects speech onset. "Low" requires louder/clearer speech to trigger. Default works for most environments.
- Speech End Sensitivity — How quickly the system decides you've stopped talking. "Low" waits longer before considering your turn finished — useful for thoughtful pauses.
- Silence Duration — How many milliseconds of silence before your turn is considered complete (100–2000ms). Higher values give you more time to pause mid-sentence.
Tips for Best Results
- Use a headset or earbuds to avoid echo and feedback
- Speak naturally — Gemini supports natural barge-in (interrupt anytime)
- Session length is limited to 15 minutes per connection (API limit)
- All settings take effect on the next session start (not during a live session)
- The audio level meter shows a colored gradient (green → yellow → orange → red) indicating your microphone input level
- Transcription of your speech and Gemini's speech can be toggled on/off independently
13. Mini-Player Windows
The Mini-Player is a compact Always-on-Top window that provides essential dictation controls without occupying your full screen.
Entering Mini-Player Mode
Click the Collapse button (↗ icon) in the header. The app window shrinks to a compact overlay positioned at the bottom center of your screen.
Mini-Player Layout
The Mini-Player displays a 3×3 grid of essential controls:
- Row 1: Speech Input selector, Status/Start button, Text Output selector
- Row 2: Voice Translate toggle, Inline Pill (spectrum analyzer), Save Recordings
- Row 3: Pipeline timing TDFs, Result preview
DPI-Aware Sizing
The Mini-Player automatically adjusts its size based on your display's DPI scaling, ensuring consistent visual dimensions across monitors with different resolutions (100%, 125%, 150%).
Exiting Mini-Player Mode
Click the Expand button to return to the full-size window at its previous position and size.
14. Global Hotkeys Windows
Talk to me registers system-wide hotkeys so you can control dictation without switching to the app window.
Primary Hotkeys
| Hotkey | Action |
|---|---|
| Ctrl+Win | Start / Stop Recording (global, works from any app) |
| Ctrl+Win (while processing) | Cancel current pipeline |
TTS Hotkey
When text is selected in any application, the TTS hotkey reads it aloud using your configured TTS provider.
Low-Level Hook
The global hotkey uses a Windows low-level keyboard hook, which means it works even when the app is minimized or another application has focus. The hook operates in "zero-swallow mode" — it intercepts the key combination without blocking other keyboard input.
15. Auto-Read Windows
Auto-Read is a Windows-exclusive feature that extracts text from the currently focused application and reads it aloud via TTS.
How It Works
- Enable Auto-Read by clicking the Auto-Read button.
- Select text in any application (or use Ctrl+C to copy).
- Talk to me detects the clipboard content and automatically reads it aloud using your TTS configuration.
Use Cases
- Read emails, articles, or documents without staring at the screen.
- Review your own writing by hearing it spoken back.
- Accessibility support for vision-impaired users.
16. Notification Listener Windows
The Notification Listener is a Full Edition exclusive feature that captures Windows toast notifications and reads them aloud via TTS.
Requirements
- Windows Desktop Full Edition (not available in the Microsoft Store Edition)
- Notification access permission granted in Windows Settings
How It Works
- Enable Notification Listener by clicking the toggle.
- Grant notification access when prompted by Windows.
- When a Windows toast notification arrives (email, chat message, calendar reminder), Talk to me extracts the notification title and body, and reads it aloud using your TTS configuration.
Configuration
- Enable/disable in Settings → Hands-Free
- TTS voice and provider follow your global TTS settings
17. MP3 Recording & Save Windows
Record TTS Readings
When enabled, every TTS synthesis is automatically saved as an MP3 file with sequential numbering (e.g., recording_001.mp3, recording_002.mp3).
Save Recordings
Click Save Recordings to open the folder containing all recorded MP3 files. You can configure the recording directory in Settings.
18. Floating Bubble (Overlay) Android
The Floating Bubble is a small circular icon that floats on top of all other apps, providing hands-free dictation access without switching apps.
Activating the Overlay
- Tap the Overlay button in the main app.
- If Android's "Draw over other apps" permission is not yet granted, you will be directed to enable it.
- A small Talk to me bubble appears on screen.
Using the Bubble
- Single Tap: Start or stop recording. Red pulsing border during recording, blue pulsing border during TTS readout.
- Triple Tap: Test readback — reads a predefined text to confirm TTS works.
- Long Press: Clears the unread message queue.
- Drag: Move the bubble anywhere on screen.
During Recording via Bubble
- Tap the bubble to start recording.
- After transcription, a "✓ Inserted!" toast confirms the text was pasted or placed in clipboard.
Stopping the Overlay
Tap the Overlay button again or tap Stop on the notification.
19. Auto-Paste Android
Auto-Paste uses Android's Accessibility Service to automatically insert dictated text into the currently focused text field.
Enabling Auto-Paste
- Tap the Auto-Paste button.
- Go to Android's Accessibility Settings.
- Find Talk to me and enable it.
- The button now shows ✓ with a cyan border.
Important Notes
- Requires Android Accessibility permission (a sensitive permission).
- May need to be re-granted after app updates.
- Used exclusively for text insertion — no other accessibility data is accessed.
20. Auto-Read Messages Android
Auto-Read automatically reads incoming chat messages aloud using TTS — ideal for driving, cooking, or exercising.
How It Works
- Enable Auto-Read (Headphones icon).
- Ensure Notification Access is granted.
- The Overlay must be active.
- When a message arrives from an allowed app, Talk to me announces the sender and reads the message aloud.
Pre-Selected Chat Apps
WhatsApp, WhatsApp Business, Telegram, Signal, Discord, Slack, Microsoft Teams, Viber, Messenger (Meta), Instagram, Google Messages, Samsung Messages.
You can add or remove apps in Auto-Read Apps Configuration.
21. Notification Access Android
Notification Access allows Talk to me to read incoming notifications, required for Auto-Read Messages.
Granting Access
- Tap the Notif Access button.
- Go to Android's Notification Listener Settings.
- Find Talk to me and enable it.
- The button shows ✓ with a cyan border.
Important Notes
- System-level permission — processes only notifications from explicitly allowed apps.
- No notification data is stored, transmitted, or logged.
22. Auto-Read Apps Configuration Android
Control which apps are allowed to have their notifications read aloud.
Known Chat Apps
Pre-selected messaging apps with individual toggles (WhatsApp, Telegram, Signal, Discord, Slack, Teams, Viber, Messenger, Instagram, Google Messages, Samsung Messages).
Search and Add Custom Apps
- Tap the search field and type an app name.
- Matching installed apps appear, sorted by relevance.
- Check the box to add an app.
How Filtering Works
- Only notifications from allowed apps are read aloud.
- Changes take effect immediately — no restart required.
23. Settings
UI Language
English, Deutsch, Français, Español — independent of your system language.
Quality Preset
| Preset | STT Provider | LLM Provider | Model | Polish |
|---|---|---|---|---|
| Top Performer | Scribe v2 | OpenAI | GPT-5.4 | Strong |
| Standard | Scribe v2 | OpenAI | GPT-4.1 mini | Strong |
| Budget | Whisper | Groq | Default | Light |
| Free | Deepgram | Groq | Default | Off |
| Custom | Manual | Manual | Manual | Manual |
Speech-to-Text
- Provider: OpenAI Whisper, Deepgram Nova-2/3, ElevenLabs Scribe v2, Groq Whisper
- Custom Keyterms (Scribe only): Proper nouns, brands, technical terms
- Language: Auto-Detect or specific
Text-to-Speech
- Provider: ElevenLabs, OpenAI TTS, Deepgram Aura 2
- Model (ElevenLabs): Eleven v3, Multilingual v2, Flash v2.5, Turbo v2.5
LLM Provider (Polish)
- Provider: OpenAI, Groq, Anthropic, Google Gemini, xAI Grok
- Model: Provider default or specific
- Polish Strength: Light or Strong
Translation Provider
Separate provider for AI-Translation (can differ from Polish provider).
AI-Polish / AI-Translate
Toggle each independently. When AI-Translate is enabled:
- Translate To: 20 target languages
- Voice Translate: Auto-read translations via TTS
Android Hands-Free
Quick toggles for Overlay, Auto-Read Messages, Auto-Paste, Notification Access.
Save and Test
- Save all current settings — Persists changes to device storage
- Test current configuration — Tests all configured providers with response times
24. Word Corrections
Word Corrections teach Talk to me the correct spelling of names, brands, and terms that speech recognition gets wrong.
Adding Corrections
Single Add
Enter Wrong spelling and Correct spelling, then tap/click Add.
Bulk Import
Enter the correct spelling, then list wrong variants (one per line). Use Generate with AI to auto-create likely misspellings.
Multi-Import
Enter pairs as wrong;correct (one per line). Supports ;, ->, comma, or tab separators.
How Corrections Work
During post-processing (Pipeline stage 3), wrong spellings are automatically replaced before AI-Polish runs.
25. Backup and Restore
Export Settings
- Open Backup & Restore in Settings.
- Tap/click Export Settings.
- Enter and confirm an Encryption Password (min. 6 characters).
- Windows: The save dialog suggests
talktome-settings.ttm— you choose the folder. - Android: The backup is written to your Downloads area as
TalkToMe-backup.ttm. If that name already exists, the system may add(1),(2), etc. — all are valid encrypted backups.
Import Settings
- Tap/click Import Settings.
- Automatic (Android): The app looks for the newest matching file named
TalkToMe-backupwith a.ttmextension (includingTalkToMe-backup (1).ttm, etc.) in app storage and in Downloads. - If the system file picker opens: On many phones (e.g. Samsung), the first screen is Recently used and may default to Images — your
.ttmfiles are hidden until you switch the top filter to Documents or This week, or open the Download folder directly. - New device: Copy the
.ttmfrom your old device (USB, cloud, email), then use Import and pick that file. - Enter the encryption password.
- All settings are restored and the app restarts.
Technical Details
- Encryption: AES-256-GCM with PBKDF2-HMAC-SHA256 (100,000 iterations)
- Included: All settings, API keys, word corrections, auto-read apps, quality preset, UI language
- NOT included: License activation (tied to Machine ID)
26. Usage Dashboard
| Metric | Description |
|---|---|
| STT Calls | Speech-to-text transcriptions performed |
| LLM Polish | AI-Polish or AI-Translate operations |
| TTS Synth | Text-to-speech synthesis operations |
Counters are cumulative since the last settings reset.
27. Troubleshooting
General
| Problem | Solution |
|---|---|
| "No API key configured" | Add a key in Key Pool for the feature you need |
| Recording doesn't start | Check microphone permission in system settings |
| Voice Translate produces no audio | Ensure a TTS API key is configured and working |
| Export fails | Check write access to Downloads folder |
| Can't see backup in Import file picker | Switch from Images to Documents / This week, or open the Download folder — see §25 Import |
Windows Windows-Specific
| Problem | Solution |
|---|---|
| Ctrl+Win hotkey doesn't work | Ensure the app is running (check system tray) |
| Text not pasted after dictation | Ensure the target window supports Ctrl+V |
| Notification Listener unavailable | Only available in Full Edition (not Store Edition) |
| Mini-Player looks too large/small | DPI-aware sizing adjusts automatically; restart the app if display settings changed |
Android Android-Specific
| Problem | Solution |
|---|---|
| Auto-Read doesn't work | Ensure Overlay is active, Auto-Read enabled, and Notification Access granted |
| Auto-Paste doesn't work | Re-enable Accessibility Service in Android Settings |
| Bubble doesn't appear | Grant "Draw over other apps" permission |
28. Privacy and Security
Data Handling
- No data collection: Talk to me does not collect, store, or transmit any user data to mrocon GmbH servers.
- Direct API communication: Audio and text go directly from your device to your chosen AI provider.
- Local storage only: All settings and API keys are stored exclusively on your device.
- No analytics: No tracking, analytics, or telemetry of any kind.
Permissions
Windows
| Permission | Purpose |
|---|---|
| Microphone | Record audio for dictation |
| Notification Access | Read notifications (Full Edition) |
| Internet | Communicate with AI providers |
Android
| Permission | Purpose |
|---|---|
| Microphone | Record audio for dictation |
| Overlay (Draw over apps) | Display the floating bubble |
| Notification Listener | Read notifications for Auto-Read |
| Accessibility Service | Auto-Paste text into fields |
| Internet | Communicate with AI providers |
| Query Installed Packages | Show app names in Auto-Read settings |
Encryption
- Windows: API keys encrypted with DPAPI (Windows Data Protection API)
- Android: API keys in app-private internal storage
- Backup files: AES-256-GCM encryption
Appendix A — Supported Languages
Speech Input Languages
Auto-Detect, German, English, French, Spanish, Italian, Portuguese, Dutch, Japanese, Korean, Chinese, Russian, Arabic, Hindi, Polish, Turkish, Swedish, Ukrainian
Translation Target Languages
German, English, French, Spanish, Italian, Portuguese, Dutch, Japanese, Chinese, Korean, Russian, Arabic, Hindi, Polish, Turkish, Swedish, Ukrainian, Danish, Finnish, Norwegian
TTS Languages
Auto, German, English, French, Italian, Spanish, Portuguese, Dutch, Polish, Swedish, Danish, Finnish, Norwegian, Turkish, Japanese, Korean, Chinese
UI Languages
English, Deutsch, Français, Español
Appendix B — Supported Providers
Speech-to-Text
| Provider | Notes |
|---|---|
| OpenAI Whisper | Most widely used, reliable |
| Deepgram Nova-2 / Nova-3 | Fast, good accuracy |
| ElevenLabs Scribe v2 | Supports custom keyterms |
| Groq Whisper | Free tier available, fast |
LLM (Polish / Translation)
| Provider | Notes |
|---|---|
| OpenAI | GPT-4o-mini, GPT-5.4, etc. |
| Groq | Free tier, Llama models |
| Anthropic | Claude models |
| Google Gemini | Gemini models |
| xAI Grok | Free tier available |
Text-to-Speech
| Provider | Notes |
|---|---|
| ElevenLabs | Best quality, voice cloning, 4 models |
| OpenAI TTS | 6 built-in voices, simple |
| Deepgram Aura 2 | Fast synthesis |
Appendix C — Quality Presets
| Preset | STT | LLM | Model | Polish | Cost |
|---|---|---|---|---|---|
| Top Performer | Scribe v2 | OpenAI | GPT-5.4 | Strong | $$$ |
| Standard | Scribe v2 | OpenAI | GPT-4.1 mini | Strong | $$ |
| Budget | Whisper | Groq | Default | Light | $ |
| Free | Deepgram | Groq | Default | Off | Free |
| Custom | Manual | Manual | Manual | Manual | Varies |
Appendix D — Keyboard Shortcuts Windows
| Shortcut | Action |
|---|---|
| Ctrl+Win | Start / Stop Recording |
| Ctrl+Win (during processing) | Cancel Pipeline |
| TTS Hotkey | Read selected text aloud |
Talk to me is a product of mrocon GmbH. All rights reserved.
For support, contact team@talktome.studio or visit talktome.studio.
↑ Back to top