Live transcription

Overview

Live transcription is a real-time transcript of your call. Whisperer listens to audio across two independent streams — your microphone (your voice) and system audio (the other person's voice) — recognizes speech and instantly tags utterances by speaker: [Me] and [Other]. The text scrolls in the overlay (LiveTranscriptStrip) and serves as context for AI suggestions.

Recognition runs through the Whisper model in real time. Transcription is streaming only (real-time): audio is sent for recognition in short chunks as it's recorded. There is no upload of a finished audio file for batch transcription — Whisperer is built for live calls, not for post-processing recordings.

When to use

Any video call. Whisperer works as an overlay on top of Zoom, Google Meet, Microsoft Teams, Telegram, Discord and any other service — no separate integration required.
Interviews (behavioral and System Design), where you can't afford to miss how a question is phrased.
Lectures, tutoring sessions, sales — when you need an accurate transcript of both sides' remarks.
Multilingual calls — Whisper understands dozens of languages; the recognition language is set per session.

Step-by-step

Grant permissions. On macOS, the two streams require two grants: "Microphone" (your voice) and "Screen Recording" (the other person's system audio); without "Screen Recording" the other person won't be heard. See macOS permissions. On Windows, system audio is captured without any additional permission — you only need microphone access; see Windows permissions.
Pick the transcription language. In session settings, set the spoken language. The default is ru. Whisper is multilingual, so for an English-language interview set en, and for a mixed call use the call's primary language.
Open the overlay and press play. A volume indicator (waveform) appears in the CommandBar — it confirms that audio is coming in.
Speak and listen. Your remarks are tagged [Me], remarks from system audio are tagged [Other]. The transcript updates in the LiveTranscriptStrip in real time.
(Optional) Enable translation. If translation is enabled in the overlay settings, a translation line (TranslationStrip) appears below the transcript.
End the session. When you're done, the transcript is saved to history (except in no-logs mode — see Limits and quotas).

Why "Screen Recording" permission is needed (macOS)

On macOS, audio from other apps (the other person's voice in Zoom/Meet) is captured through the screen-recording mechanism — the same system facility as screen capture. So Whisperer requests the "Screen Recording" permission not to watch your screen, but to technically obtain the system audio stream (and to take screenshots for vision suggestions). Without this grant, only the microphone is recorded — you're heard, the other person isn't.

On Windows it's simpler: the other person's system audio is captured without any additional permission — microphone access is enough. Details are in Windows permissions.

Screenshots

📸 [Screenshot: overlay with a scrolling transcript and [Me]/[Other] tagging]

📸 [Screenshot: choosing the transcription language in session settings]

📸 [Screenshot: volume indicator (waveform) in the CommandBar]

Common mistakes

The other person isn't heard, the transcript only shows [Me]. The "Screen Recording" permission hasn't been granted. Open System Settings → Privacy & Security → Screen Recording, enable Whisperer, then restart the client.
Transcript is in the wrong language / lots of recognition errors. The transcription language is set incorrectly. Change the session language to the actual language of the conversation before starting.
No volume indicator. No input device is selected or microphone access isn't granted — check "Microphone" in your privacy settings.
Waiting for a recording to upload. There is no batch transcription of a finished file — transcription only works live during a session.

Best practices

Before an important meeting, do a test call with a colleague and confirm the other person shows up in the transcript as [Other].
Enable noise suppression in the overlay settings in a noisy room — it improves Whisper's accuracy.
For language-mixed calls, choose the language spoken most of the time; Whisper handles switches, but it's better to set the base language explicitly.
If the content is sensitive, use no-logs mode — the transcript won't be saved to the database (minutes are still consumed, however).
Keep clear diction and avoid talking over each other — separate utterances are tagged by speaker more accurately.