Transcript and Speakers

Whisperer transcribes speech in real time using Whisper — OpenAI's multilingual model. Speaker diarization runs simultaneously: the user's words and the counterpart's words are labeled differently, making the transcript easier to read and improving the accuracy of AI responses.

When to Read This

Read this article to understand:

how to configure the recognition language;
why some utterances are labeled [Me] and others [Other];
which languages are supported and how to switch between them.

How Transcription Works

Whisperer captures two audio streams:

Stream	Source	Label
User's voice	Microphone	[Me]
Counterpart's voice	System audio (Screen Recording permission on macOS / system audio on Windows)	[Other]

Each audio chunk (~0.8 s) is sent to the server with a speaker label and is recognized independently. The result appears instantly in the LiveTranscriptStrip — the scrolling ticker at the bottom of the overlay.

Transcription Language

The recognition language is set at the session level:

Open Settings in the client (gear icon) or in the web dashboard.
Find the Transcription Language field.
Select the desired language from the standard list of language codes (e.g., en, ru, zh, de).
Start a new session — the language will be applied to it.

Whisper supports more than 90 languages. If meeting participants speak different languages, Whisper auto-detects each chunk within the selected language "hint".

📸 [Screenshot: LiveTranscriptStrip scrolling ticker with [Me] and [Other] utterances]

Full Transcript in the Dashboard

After the session ends, the full transcript is available in the History section of the web dashboard. You can:

read it filtered by speaker;
copy it in full or in fragments;
use it as the basis for analytics and mind maps.

📸 [Screenshot: session page in the dashboard — transcript blocks with speaker labels]

Common Errors

Error	Cause	Fix
Counterpart's voice is not transcribed	Screen Recording permission not granted (macOS) or system audio unavailable (Windows)	macOS Permissions / Windows
Wrong language in the transcript	Incorrect transcription language selected	Change the language in settings and restart the session
Text gets mixed between speakers	Microphone captures both audio streams (echo)	Use headphones or lower speaker volume
No text on a weak connection	WebSocket drops before Whisper response arrives	Improve your connection; Whisperer reconnects automatically

Best Practices

Use headphones — this eliminates acoustic echo and improves speaker separation.
Select the correct language before the session — changing the language mid-recording creates a new session.
On bilingual calls you can select the counterpart's language: Whisper will still recognize your speech thanks to the context hint.