Talk to me — speak, type, listen

Talk to me — User Manual

Version: 0.5.96 (Windows Desktop) / 0.5.81 (Android Hands-Free) Last Updated: 2026-04-01
This manual covers both the Windows Desktop and Android Hands-Free editions of Talk to me. Sections marked with Windows or Android apply only to that platform. All other sections apply to both.

1. Introduction

Talk to me is a professional dictation, translation, and voice interaction studio available for Windows Desktop and Android. It converts your speech into text, polishes it with AI, translates it into 20+ languages, and reads it back to you — all in real time.

The app follows a strict BYOK (Bring Your Own Key) and Zero-Knowledge / Zero-Trust architecture: your API keys and data never leave your device.

Key Features

  • Real-time Dictation: Record your voice and get polished text in seconds.
  • AI-Polish: Automatic grammar correction and filler word removal powered by your choice of AI provider.
  • Live Translation: Translate dictated text into 20+ languages on the fly.
  • Voice Translate (Speech-to-Speech): Your translated text is automatically read aloud in the target language.
  • Text-to-Speech: Convert any text into natural-sounding speech with ElevenLabs, OpenAI TTS, or Deepgram.
  • Live Language Immersion: Speak in your native language, instantly see and hear it in the language you want to master.
  • Word Corrections: Teach the app your names, brands, and terms that speech recognition gets wrong.
  • Encrypted Backup: Export all settings and API keys as a password-protected encrypted file.
  • Multi-Provider Support: Choose from OpenAI, Groq, Anthropic, Google Gemini, xAI Grok, ElevenLabs, Deepgram, and more.

Platform Highlights

Feature Windows Desktop Android Hands-Free
Mini-Player (compact mode)
Global Hotkeys (Ctrl+Win)
Auto-Read (Ctrl+C text extraction)
Notification Listener (Full Edition)
MP3 Recording & Save
Floating Pill (Spectrum Analyzer)
Floating Bubble (Overlay)
Auto-Paste (Accessibility)
Auto-Read Messages (from chat apps)
App-level Notification Access

Security Principles

  • Zero-Knowledge: Talk to me never stores, transmits, or has access to your API keys on any server. All keys are stored locally on your device.
  • Zero-Trust: The app never phones home. No analytics, no tracking, no telemetry. Your dictation data flows directly from your device to your chosen AI provider and nowhere else.
  • BYOK: You bring your own API keys from the providers you trust. Talk to me does not resell API access.

2. Getting Started

Windows Installation — Windows Desktop

Talk to me for Windows is available as an EV-signed installer from talktome.studio or via the Microsoft Store.

System Requirements:

  • Windows 10 or later (64-bit)
  • An active internet connection
  • At least one API key from a supported provider

The installer is digitally signed with an Extended Validation (EV) certificate from Certum (mrocon GmbH). Windows SmartScreen will not show any warnings.

Android Installation — Android

Talk to me for Android is available as an APK from talktome.studio or via the Google Play Store.

System Requirements:

  • Android 8.0 or later
  • An active internet connection
  • At least one API key from a supported provider

First Launch

When you open Talk to me for the first time, you will see the License Gate. You have two options:

  1. Enter a License Key to unlock the full app immediately.
  2. Start a 7-Day Free Trial to explore all features without a license key.

After activation or trial start, the app loads and you can begin using it right away — provided you have at least one API key configured (see Key Pool).

3. License Activation

The License Gate

On first launch (or after trial expiration), the License Gate is displayed. It shows:

  • The Talk to me wordmark
  • A text field for your license key (format: TTM-XXXX-XXXX-XXXX-XXXX)
  • Your Machine ID (a unique device identifier, needed for activation)
  • An Activate button
  • A Start 7-Day Free Trial button (if no trial has been used)
  • Links to Buy a License and the Customer Portal

Activating a License

  1. Enter your license key in the text field.
  2. Tap/click Activate.
  3. The app verifies your key online and activates it for this device.
  4. Once activated, you will not see the License Gate again unless you deactivate or your license expires.

The Free Trial

  • Tap/click Start 7-Day Free Trial to unlock all features for 7 days.
  • A banner at the top of the app shows how many trial days remain.
  • After 7 days, the trial expires and the License Gate reappears.

License Modal

Once inside the app, you can view your license status by clicking the License button (shield icon). The License Modal shows:

  • Status: Active, Trial, Grace Period, or Expired
  • Product: Your license product name
  • Plan: Yearly or Lifetime
  • Expires: Expiration date (or "Lifetime")
  • Devices: Number of active devices / maximum allowed
  • Key: Your license key (partially masked)
  • Machine ID: Your device's unique identifier

From this modal you can:

  • Deactivate Device — releases the license from this device so you can use it on another
  • Close — return to the app

4. App Overview

The app is organized into two main tabs and several supporting sections:

Navigation

At the top of the screen, two tabs let you switch between the app's primary modes:

  • Speech-to-Text — Record your voice and get polished, translated text
  • Text-to-Speech — Convert written text into spoken audio

Interface Layout

Below the tabs, the main interface is arranged vertically:

  1. Quick-Override Controls — Language selectors for input and output
  2. Action Buttons — Quick access to platform features
  3. Status Indicator — Shows the current state (Ready, Recording, Transcribing, etc.)
  4. Pipeline Display — Visual progress of your dictation through the processing stages
  5. Result Area — Your transcribed/translated text
  6. TTS Panel (Text-to-Speech tab only) — Text input and playback controls
  7. Key Pool — Manage your API keys
  8. Settings — All configuration options

Action Buttons

Windows Desktop action buttons:

  • Voice Translate — Toggle speech-to-speech translation
  • Notification Listener — Toggle notification readout (Full Edition)
  • Auto-Read — Toggle Ctrl+C text-to-speech
  • Record TTS Readings — Toggle MP3 recording of TTS output
  • Save Recordings — Open recordings folder

Android action buttons:

  • License — Open license modal
  • Voice Translate — Toggle speech-to-speech translation
  • Overlay — Start/stop the Floating Bubble
  • Auto-Paste — Open Accessibility settings
  • Auto-Read — Toggle auto-read messages
  • Notif Access — Open notification listener settings

The Info Button

In the header, the Info button opens the App Info modal, which displays:

  • A link to talktome.studio
  • The support email (tap/click to copy)
  • The current app version
  • Number of detected microphones

5. Speech-to-Text

The Speech-to-Text tab is the primary mode of Talk to me. Here, you record your voice and receive polished, optionally translated text.

Recording a Dictation

  1. Ensure the status shows Ready — Start Dictation (green).
  2. Click/tap the large Start Dictation button.
  3. The button turns red and shows Stop Recording. Speak clearly.
  4. While recording, you can see: Recording duration in seconds, Audio level meter showing input volume, the currently active STT provider and language.
  5. Click/tap the button again to Stop Recording.

Windows You can also start/stop recording using the global hotkey Ctrl+Win (no need to focus the app window).

What Happens After Recording

After you stop recording, the app processes your audio through the Pipeline (see The Pipeline):

  1. Capture — Audio recording is finalized
  2. STT — Your audio is transcribed by the selected provider
  3. Post-Processing — The raw text is cleaned up (word corrections applied)
  4. Polish / Translation — If enabled, AI corrects grammar or translates the text
  5. Inject — The final text is placed in your clipboard

Windows The text is automatically pasted into the previously focused window via simulated Ctrl+V (Smart Clipboard Injection).

Android If Auto-Paste is enabled, the text is automatically inserted into the active text field via the Accessibility Service.

The Result Area

After processing, your text appears in the result area. A hint confirms the text has been copied to your clipboard and is ready to paste.

6. Text-to-Speech

The Text-to-Speech tab lets you convert any written text into natural-sounding speech.

Basic Usage

  1. Switch to the Text-to-Speech tab.
  2. Type or paste text into the text area.
  3. Click/tap Read Aloud to start playback.

Playback Controls

  • Pause — Temporarily stops playback
  • Resume — Continues from where you paused
  • Stop — Ends playback entirely
  • Replay — Plays the same audio again without re-synthesizing

Provider and Voice Selection

  • ElevenLabs: Choose from your available voices or use "Default (Brian v3)". Custom Voice-IDs supported.
  • OpenAI TTS: Nova, Alloy, Echo, Fable, Onyx, Shimmer
  • Deepgram Aura 2: Fast synthesis

Model Selection (ElevenLabs)

ModelCharacter LimitBest For
Eleven v35,000Highest quality, short content
Multilingual v210,000Multi-language support
Flash v2.540,000Fast synthesis, long texts
Turbo v2.540,000Speed and quality balance

Audio Quality

QualityDescription
MP3 192 kbpsCreator quality — highest fidelity
MP3 128 kbpsStandard — good balance
MP3 64 kbpsCompact — smaller file size
MP3 32 kbpsMinimal — lowest quality

Text Normalization

SettingDescription
AutoThe model decides how to handle numbers
Always OnNumbers converted to words (e.g., "42" → "forty-two")
OffNo normalization applied

Voice Fine-Tuning (ElevenLabs)

SliderRangeDescription
StabilityVariable ↔ StableLower = more expressive; Higher = more consistent
SimilarityCreative ↔ OriginalHow closely the output matches the original voice
StyleNeutral ↔ ExpressiveAmount of emotional expression
SpeedSlow (0.7×) ↔ Fast (1.2×)Playback speed

Additional Options

  • Code-Filter: Strips code blocks and technical syntax before synthesis.
  • Auto-Record: Automatically saves synthesized audio. Tap the folder icon to choose the directory.
  • Speaker Boost: Enhances voice clarity (ElevenLabs only).

7. The Pipeline

The Pipeline is Talk to me's core processing engine. It visualizes the stages your audio passes through from recording to final output.

Pipeline Stages

StageLabelDescription
1CaptureAudio recording and finalization
2STTSpeech-to-Text transcription
3PostPost-processing (cleanup, word corrections)
4Polish or TransAI-Polish or AI-Translate
5InjectText copied to clipboard / auto-pasted

TDF (Text Display Field) Indicators

Each pipeline stage shows the active provider (e.g., "Scribe v2", "GPT-5.4") and timing information after completion.

Timing Display

After processing, a timing line shows:

STT 1.2s → LLM 0.8s → Inject 0.1s → Total 2.1s

If Voice Translate is active, an additional S2S (Speech-to-Speech) timing is shown.

8. Voice Translate

Voice Translate combines AI-Translation with Text-to-Speech to create a real-time speech-to-speech translation experience.

How It Works

  1. Enable Voice Translate (purple when active).
  2. Record a dictation in your source language.
  3. The app transcribes → translates → reads the translation aloud.

Configuration

  • Target Language: Set in Settings → AI-Translate → Translate To
  • TTS Voice: Uses your configured TTS provider and voice
  • Polish: Automatically disabled when Voice Translate is active

Use Cases

  • Travel: Speak in your language, have the translation read aloud.
  • Language Learning: Hear how your text sounds in another language.
  • Live Language Immersion: Turn your own thoughts into live fluency — speak in your native language and absorb the output in the language you want to master.

9. AI Polish & Translation

AI-Polish

When enabled, AI-Polish corrects grammar, punctuation, and (with "Strong" setting) removes filler words like "um", "uh", "you know", "basically".

Polish Strength:

  • Light — Grammar and punctuation correction only
  • Strong — Also removes filler words

Status indicators:

  • POLISH (cyan) — Active
  • OFF — Disabled
  • KEY MISSING (yellow) — No LLM key configured

AI-Translate

When enabled, your dictated text is translated into the target language.

Status indicators:

  • TRANSLATE (cyan) — Active, showing target language
  • VOICE OUTPUT (purple) — Voice Translate also active
  • TEXT ONLY — Translation without voice output
  • OFF — Disabled
Important: AI-Polish and AI-Translate are mutually exclusive — enabling one disables the other.

10. Quick-Override Controls

The Quick-Override controls allow you to temporarily change the input or output language for a single dictation without modifying your saved settings.

Speech Input Override

Select a different input language for the next recording:

  • Auto-Detect — The STT provider detects the language automatically
  • Individual languages (see Appendix A)

Text Output Override

Select a different output language (equivalent to temporarily enabling translation):

  • Default (same as input) — No translation
  • All 20 translation languages

Reset to Settings

When an override is active, a Reset button (↩ icon) appears. Tap/click it to revert to your saved settings.

11. Key Pool

The Key Pool is where you manage your API keys. Talk to me uses a pool-based architecture — you can add multiple keys per category, and the app automatically rotates between them based on trust scores.

Categories

CategoryPurposeSupported Providers
Speech-to-TextTranscriptionOpenAI Whisper, Deepgram Nova, ElevenLabs Scribe v2, Groq Whisper
AI-Polish / LLMGrammar, translationOpenAI, Groq, Anthropic, Google Gemini, xAI Grok
Text-to-SpeechVoice synthesisElevenLabs, Deepgram, OpenAI TTS

Adding a Key

  1. Expand the Key Pool section.
  2. Click/tap + Add Key in the desired category.
  3. Select the Provider.
  4. Enter a Label (e.g., "My OpenAI Key").
  5. Enter your API Key.
  6. Click/tap Save Key.

Key Slot Features

Each key slot displays:

  • Label and Provider
  • Masked Key (last 4 characters visible)
  • Trust Score — Color-coded (green/yellow/red)
  • Statistics — Calls, successes, failures, rate limits

Actions per slot:

  • Test — Verify the key works
  • Pause / Activate — Temporarily disable or re-enable
  • Remove — Permanently delete

Trust System

LevelScoreColorBehavior
Excellent≥80%GreenPreferred
Good≥60%GreenNormal
OK≥40%YellowFallback
Weak≥20%YellowRarely used
Critical<20%RedLast resort

Keys that hit rate limits are placed in automatic cooldown while other keys are used.

12. AI Voice Chat

AI Voice Chat lets you have real-time voice conversations with Google Gemini. Speak naturally, get answered instantly, interrupt freely — just like talking to another person. Powered by the Gemini Live API with sub-second latency.

Headphones strongly recommended (Android / phones)

For AI Voice Chat on phones, use wired or Bluetooth earphones or headphones when you can. Playing AI replies through the built-in speaker can let the microphone pick up the model’s voice (acoustic feedback). That may create false “you” transcripts or confuse turn-taking — even though playback and conversation can still work. Desktop users with a comms headset (good echo cancellation) typically have fewer issues. We are working toward better speaker-only behavior.

Requirements

You need a Google Gemini API key (paid tier recommended) added to the LLM Key Pool in Settings. The key is automatically available for AI Voice Chat.

Starting a Conversation

Navigate to the Gemini Live tab. Tap Start Conversation. The app connects to Gemini via WebSocket, opens your microphone, and begins listening. Speak naturally — Gemini responds in real-time audio. Tap End to stop.

Voices (30 Options)

Choose from 30 natural AI voices, each with a distinct personality:

VoiceCharacterBest For
SulafatWarmStorytelling, bedtime stories, calm conversations
GacruxMatureAuthoritative narration, mentoring, deep discussions
AlgenibGravellyCinematic narration, dramatic reading, character voice
KoreFirmProfessional briefings, news reading, factual Q&A
PuckUpbeatEnergetic conversations, motivation, brainstorming
ZephyrBrightOptimistic chats, friendly assistance, greetings
CharonInformativeTutorials, documentary-style explanations
FenrirExcitableEnthusiastic reactions, game commentary, hype
LedaYouthfulCasual chat, Gen-Z conversations, trendy topics
AoedeBreezyRelaxed conversations, travel talk, lifestyle
AchernarSoftMeditation guidance, ASMR-style, gentle encouragement
AlgiebaSmoothPodcast hosting, audiobooks, long-form reading
DespinaSmoothElegant narration, luxury brand voice
AchirdFriendlyCustomer support, everyday assistance, welcoming tone
VindemiatrixGentleSupportive conversations, therapy-like tone, empathy
SadaltagerKnowledgeableTechnical explanations, expert Q&A, encyclopedic
RasalgethiInformativeScience documentaries, educational content
SchedarEvenBalanced discussions, neutral reporting, debates
AlnilamFirmCommanding presence, leadership, formal settings
PulcherrimaForwardAssertive communication, pitches, presentations
ZubenelgenubiCasualLaid-back chat, friends catching up, humor
SadachbiaLivelyAnimated storytelling, children's content, playful
LaomedeiaUpbeatMorning shows, cheerful updates, positive vibes
CallirrhoeEasy-goingCasual advice, lifestyle coaching, approachable
AutonoeBrightCreative sessions, idea generation, art discussions
EnceladusBreathyIntimate narration, poetry reading, atmospheric
IapetusClearPrecise instructions, step-by-step guides, clarity
ErinomeClearClean communication, corporate training, diction
UmbrielEasy-goingRelaxed Q&A, weekend vibes, mellow conversations

Tip: Preview all voices in the Google AI Studio Voice Library.

Language

Select from 24 supported languages or leave on Auto-detect. Gemini will respond in the language you speak — or in the language you select. Supported: English, German, French, Spanish, Italian, Portuguese, Dutch, Polish, Romanian, Russian, Ukrainian, Turkish, Arabic, Hindi, Bengali, Tamil, Telugu, Marathi, Japanese, Korean, Thai, Vietnamese, Indonesian.

Persona Presets

Persona presets define how Gemini behaves — its personality, tone, and communication style. Choose from six presets or create your own:

PresetBehavior
Friendly AssistantWarm, conversational, approachable — great for everyday use
ProfessionalClear, concise, authoritative — for business and work
EnthusiasticEnergetic, positive, encouraging — for brainstorming and motivation
Calm & SoothingSlow, gentle, patient — for relaxation and guided sessions
TeacherPatient, step-by-step, uses analogies — for learning and explanations
CreativeImaginative, expressive, vivid language — for storytelling and art
CustomWrite your own system instruction from scratch

System Instruction

The System Instruction is a text briefing you give to Gemini before the conversation starts. Think of it as directing an actor: tell the AI who it is, how to behave, and what to focus on.

Examples:

  • "You are a patient Italian language tutor. Speak slowly. Correct my grammar gently."
  • "You are a senior software architect. Answer concisely and technically."
  • "You are a creative storyteller. Speak with flair. Use vivid language."

When using a Persona Preset, your custom text is appended to the preset instruction. In Custom mode, your text is the entire instruction. Write in English for best results. Settings are saved automatically when you click outside the text field.

Temperature & Top-P

Temperature (0.0 – 2.0) controls how creative vs. predictable the AI responds:

RangeBehaviorBest For
0.0 – 0.5Focused, deterministic, repetitiveFacts, technical answers, precise instructions
0.7 – 1.0Balanced, natural (default: 1.0)Most conversations, everyday use
1.2 – 2.0Creative, surprising, unpredictableBrainstorming, storytelling, creative writing

Top-P (0.0 – 1.0) limits the pool of words the AI considers. At 0.95 (default), the model picks from the top 95% most likely words, cutting off the improbable "tail". Lower values make output more conservative.

Voice Activity Detection (VAD)

VAD settings control how Gemini detects when you start and stop speaking:

  • Speech Start Sensitivity — How easily the system detects speech onset. "Low" requires louder/clearer speech to trigger. Default works for most environments.
  • Speech End Sensitivity — How quickly the system decides you've stopped talking. "Low" waits longer before considering your turn finished — useful for thoughtful pauses.
  • Silence Duration — How many milliseconds of silence before your turn is considered complete (100–2000ms). Higher values give you more time to pause mid-sentence.

Tips for Best Results

  • Use a headset or earbuds to avoid echo and feedback
  • Speak naturally — Gemini supports natural barge-in (interrupt anytime)
  • Session length is limited to 15 minutes per connection (API limit)
  • All settings take effect on the next session start (not during a live session)
  • The audio level meter shows a colored gradient (green → yellow → orange → red) indicating your microphone input level
  • Transcription of your speech and Gemini's speech can be toggled on/off independently

13. Mini-Player Windows

The Mini-Player is a compact Always-on-Top window that provides essential dictation controls without occupying your full screen.

Entering Mini-Player Mode

Click the Collapse button (↗ icon) in the header. The app window shrinks to a compact overlay positioned at the bottom center of your screen.

Mini-Player Layout

The Mini-Player displays a 3×3 grid of essential controls:

  • Row 1: Speech Input selector, Status/Start button, Text Output selector
  • Row 2: Voice Translate toggle, Inline Pill (spectrum analyzer), Save Recordings
  • Row 3: Pipeline timing TDFs, Result preview

DPI-Aware Sizing

The Mini-Player automatically adjusts its size based on your display's DPI scaling, ensuring consistent visual dimensions across monitors with different resolutions (100%, 125%, 150%).

Exiting Mini-Player Mode

Click the Expand button to return to the full-size window at its previous position and size.

14. Global Hotkeys Windows

Talk to me registers system-wide hotkeys so you can control dictation without switching to the app window.

Primary Hotkeys

HotkeyAction
Ctrl+WinStart / Stop Recording (global, works from any app)
Ctrl+Win (while processing)Cancel current pipeline

TTS Hotkey

When text is selected in any application, the TTS hotkey reads it aloud using your configured TTS provider.

Low-Level Hook

The global hotkey uses a Windows low-level keyboard hook, which means it works even when the app is minimized or another application has focus. The hook operates in "zero-swallow mode" — it intercepts the key combination without blocking other keyboard input.

15. Auto-Read Windows

Auto-Read is a Windows-exclusive feature that extracts text from the currently focused application and reads it aloud via TTS.

How It Works

  1. Enable Auto-Read by clicking the Auto-Read button.
  2. Select text in any application (or use Ctrl+C to copy).
  3. Talk to me detects the clipboard content and automatically reads it aloud using your TTS configuration.

Use Cases

  • Read emails, articles, or documents without staring at the screen.
  • Review your own writing by hearing it spoken back.
  • Accessibility support for vision-impaired users.

16. Notification Listener Windows

The Notification Listener is a Full Edition exclusive feature that captures Windows toast notifications and reads them aloud via TTS.

Requirements

  • Windows Desktop Full Edition (not available in the Microsoft Store Edition)
  • Notification access permission granted in Windows Settings

How It Works

  1. Enable Notification Listener by clicking the toggle.
  2. Grant notification access when prompted by Windows.
  3. When a Windows toast notification arrives (email, chat message, calendar reminder), Talk to me extracts the notification title and body, and reads it aloud using your TTS configuration.

Configuration

  • Enable/disable in Settings → Hands-Free
  • TTS voice and provider follow your global TTS settings

17. MP3 Recording & Save Windows

Record TTS Readings

When enabled, every TTS synthesis is automatically saved as an MP3 file with sequential numbering (e.g., recording_001.mp3, recording_002.mp3).

Save Recordings

Click Save Recordings to open the folder containing all recorded MP3 files. You can configure the recording directory in Settings.

18. Floating Bubble (Overlay) Android

The Floating Bubble is a small circular icon that floats on top of all other apps, providing hands-free dictation access without switching apps.

Activating the Overlay

  1. Tap the Overlay button in the main app.
  2. If Android's "Draw over other apps" permission is not yet granted, you will be directed to enable it.
  3. A small Talk to me bubble appears on screen.

Using the Bubble

  • Single Tap: Start or stop recording. Red pulsing border during recording, blue pulsing border during TTS readout.
  • Triple Tap: Test readback — reads a predefined text to confirm TTS works.
  • Long Press: Clears the unread message queue.
  • Drag: Move the bubble anywhere on screen.

During Recording via Bubble

  1. Tap the bubble to start recording.
  2. After transcription, a "✓ Inserted!" toast confirms the text was pasted or placed in clipboard.

Stopping the Overlay

Tap the Overlay button again or tap Stop on the notification.

19. Auto-Paste Android

Auto-Paste uses Android's Accessibility Service to automatically insert dictated text into the currently focused text field.

Enabling Auto-Paste

  1. Tap the Auto-Paste button.
  2. Go to Android's Accessibility Settings.
  3. Find Talk to me and enable it.
  4. The button now shows ✓ with a cyan border.

Important Notes

  • Requires Android Accessibility permission (a sensitive permission).
  • May need to be re-granted after app updates.
  • Used exclusively for text insertion — no other accessibility data is accessed.

20. Auto-Read Messages Android

Auto-Read automatically reads incoming chat messages aloud using TTS — ideal for driving, cooking, or exercising.

How It Works

  1. Enable Auto-Read (Headphones icon).
  2. Ensure Notification Access is granted.
  3. The Overlay must be active.
  4. When a message arrives from an allowed app, Talk to me announces the sender and reads the message aloud.

Pre-Selected Chat Apps

WhatsApp, WhatsApp Business, Telegram, Signal, Discord, Slack, Microsoft Teams, Viber, Messenger (Meta), Instagram, Google Messages, Samsung Messages.

You can add or remove apps in Auto-Read Apps Configuration.

21. Notification Access Android

Notification Access allows Talk to me to read incoming notifications, required for Auto-Read Messages.

Granting Access

  1. Tap the Notif Access button.
  2. Go to Android's Notification Listener Settings.
  3. Find Talk to me and enable it.
  4. The button shows ✓ with a cyan border.

Important Notes

  • System-level permission — processes only notifications from explicitly allowed apps.
  • No notification data is stored, transmitted, or logged.

22. Auto-Read Apps Configuration Android

Control which apps are allowed to have their notifications read aloud.

Known Chat Apps

Pre-selected messaging apps with individual toggles (WhatsApp, Telegram, Signal, Discord, Slack, Teams, Viber, Messenger, Instagram, Google Messages, Samsung Messages).

Search and Add Custom Apps

  1. Tap the search field and type an app name.
  2. Matching installed apps appear, sorted by relevance.
  3. Check the box to add an app.

How Filtering Works

  • Only notifications from allowed apps are read aloud.
  • Changes take effect immediately — no restart required.

23. Settings

UI Language

English, Deutsch, Français, Español — independent of your system language.

Quality Preset

PresetSTT ProviderLLM ProviderModelPolish
Top PerformerScribe v2OpenAIGPT-5.4Strong
StandardScribe v2OpenAIGPT-4.1 miniStrong
BudgetWhisperGroqDefaultLight
FreeDeepgramGroqDefaultOff
CustomManualManualManualManual

Speech-to-Text

  • Provider: OpenAI Whisper, Deepgram Nova-2/3, ElevenLabs Scribe v2, Groq Whisper
  • Custom Keyterms (Scribe only): Proper nouns, brands, technical terms
  • Language: Auto-Detect or specific

Text-to-Speech

  • Provider: ElevenLabs, OpenAI TTS, Deepgram Aura 2
  • Model (ElevenLabs): Eleven v3, Multilingual v2, Flash v2.5, Turbo v2.5

LLM Provider (Polish)

  • Provider: OpenAI, Groq, Anthropic, Google Gemini, xAI Grok
  • Model: Provider default or specific
  • Polish Strength: Light or Strong

Translation Provider

Separate provider for AI-Translation (can differ from Polish provider).

AI-Polish / AI-Translate

Toggle each independently. When AI-Translate is enabled:

  • Translate To: 20 target languages
  • Voice Translate: Auto-read translations via TTS

Android Hands-Free

Quick toggles for Overlay, Auto-Read Messages, Auto-Paste, Notification Access.

Save and Test

  • Save all current settings — Persists changes to device storage
  • Test current configuration — Tests all configured providers with response times

24. Word Corrections

Word Corrections teach Talk to me the correct spelling of names, brands, and terms that speech recognition gets wrong.

Adding Corrections

Single Add

Enter Wrong spelling and Correct spelling, then tap/click Add.

Bulk Import

Enter the correct spelling, then list wrong variants (one per line). Use Generate with AI to auto-create likely misspellings.

Multi-Import

Enter pairs as wrong;correct (one per line). Supports ;, ->, comma, or tab separators.

How Corrections Work

During post-processing (Pipeline stage 3), wrong spellings are automatically replaced before AI-Polish runs.

25. Backup and Restore

Export Settings

  1. Open Backup & Restore in Settings.
  2. Tap/click Export Settings.
  3. Enter and confirm an Encryption Password (min. 6 characters).
  4. Windows: The save dialog suggests talktome-settings.ttm — you choose the folder.
  5. Android: The backup is written to your Downloads area as TalkToMe-backup.ttm. If that name already exists, the system may add (1), (2), etc. — all are valid encrypted backups.

Import Settings

  1. Tap/click Import Settings.
  2. Automatic (Android): The app looks for the newest matching file named TalkToMe-backup with a .ttm extension (including TalkToMe-backup (1).ttm, etc.) in app storage and in Downloads.
  3. If the system file picker opens: On many phones (e.g. Samsung), the first screen is Recently used and may default to Images — your .ttm files are hidden until you switch the top filter to Documents or This week, or open the Download folder directly.
  4. New device: Copy the .ttm from your old device (USB, cloud, email), then use Import and pick that file.
  5. Enter the encryption password.
  6. All settings are restored and the app restarts.

Technical Details

  • Encryption: AES-256-GCM with PBKDF2-HMAC-SHA256 (100,000 iterations)
  • Included: All settings, API keys, word corrections, auto-read apps, quality preset, UI language
  • NOT included: License activation (tied to Machine ID)

26. Usage Dashboard

MetricDescription
STT CallsSpeech-to-text transcriptions performed
LLM PolishAI-Polish or AI-Translate operations
TTS SynthText-to-speech synthesis operations

Counters are cumulative since the last settings reset.

27. Troubleshooting

General

ProblemSolution
"No API key configured"Add a key in Key Pool for the feature you need
Recording doesn't startCheck microphone permission in system settings
Voice Translate produces no audioEnsure a TTS API key is configured and working
Export failsCheck write access to Downloads folder
Can't see backup in Import file pickerSwitch from Images to Documents / This week, or open the Download folder — see §25 Import

Windows Windows-Specific

ProblemSolution
Ctrl+Win hotkey doesn't workEnsure the app is running (check system tray)
Text not pasted after dictationEnsure the target window supports Ctrl+V
Notification Listener unavailableOnly available in Full Edition (not Store Edition)
Mini-Player looks too large/smallDPI-aware sizing adjusts automatically; restart the app if display settings changed

Android Android-Specific

ProblemSolution
Auto-Read doesn't workEnsure Overlay is active, Auto-Read enabled, and Notification Access granted
Auto-Paste doesn't workRe-enable Accessibility Service in Android Settings
Bubble doesn't appearGrant "Draw over other apps" permission

28. Privacy and Security

Data Handling

  • No data collection: Talk to me does not collect, store, or transmit any user data to mrocon GmbH servers.
  • Direct API communication: Audio and text go directly from your device to your chosen AI provider.
  • Local storage only: All settings and API keys are stored exclusively on your device.
  • No analytics: No tracking, analytics, or telemetry of any kind.

Permissions

Windows

PermissionPurpose
MicrophoneRecord audio for dictation
Notification AccessRead notifications (Full Edition)
InternetCommunicate with AI providers

Android

PermissionPurpose
MicrophoneRecord audio for dictation
Overlay (Draw over apps)Display the floating bubble
Notification ListenerRead notifications for Auto-Read
Accessibility ServiceAuto-Paste text into fields
InternetCommunicate with AI providers
Query Installed PackagesShow app names in Auto-Read settings

Encryption

  • Windows: API keys encrypted with DPAPI (Windows Data Protection API)
  • Android: API keys in app-private internal storage
  • Backup files: AES-256-GCM encryption

Appendix A — Supported Languages

Speech Input Languages

Auto-Detect, German, English, French, Spanish, Italian, Portuguese, Dutch, Japanese, Korean, Chinese, Russian, Arabic, Hindi, Polish, Turkish, Swedish, Ukrainian

Translation Target Languages

German, English, French, Spanish, Italian, Portuguese, Dutch, Japanese, Chinese, Korean, Russian, Arabic, Hindi, Polish, Turkish, Swedish, Ukrainian, Danish, Finnish, Norwegian

TTS Languages

Auto, German, English, French, Italian, Spanish, Portuguese, Dutch, Polish, Swedish, Danish, Finnish, Norwegian, Turkish, Japanese, Korean, Chinese

UI Languages

English, Deutsch, Français, Español

Appendix B — Supported Providers

Speech-to-Text

ProviderNotes
OpenAI WhisperMost widely used, reliable
Deepgram Nova-2 / Nova-3Fast, good accuracy
ElevenLabs Scribe v2Supports custom keyterms
Groq WhisperFree tier available, fast

LLM (Polish / Translation)

ProviderNotes
OpenAIGPT-4o-mini, GPT-5.4, etc.
GroqFree tier, Llama models
AnthropicClaude models
Google GeminiGemini models
xAI GrokFree tier available

Text-to-Speech

ProviderNotes
ElevenLabsBest quality, voice cloning, 4 models
OpenAI TTS6 built-in voices, simple
Deepgram Aura 2Fast synthesis

Appendix C — Quality Presets

Preset STT LLM Model Polish Cost
Top PerformerScribe v2OpenAIGPT-5.4Strong$$$
StandardScribe v2OpenAIGPT-4.1 miniStrong$$
BudgetWhisperGroqDefaultLight$
FreeDeepgramGroqDefaultOffFree
CustomManualManualManualManualVaries

Appendix D — Keyboard Shortcuts Windows

ShortcutAction
Ctrl+WinStart / Stop Recording
Ctrl+Win (during processing)Cancel Pipeline
TTS HotkeyRead selected text aloud

Talk to me is a product of mrocon GmbH. All rights reserved.

For support, contact team@talktome.studio or visit talktome.studio.

↑ Back to top