Transcript and Speakers
Whisperer transcribes speech in real time using Whisper — OpenAI's multilingual model. Speaker diarization runs simultaneously: the user's words and the counterpart's words are labeled differently, making the transcript easier to read and improving the accuracy of AI responses.
When to Read This
Read this article to understand:
- how to configure the recognition language;
- why some utterances are labeled [Me] and others [Other];
- which languages are supported and how to switch between them.
How Transcription Works
Whisperer captures two audio streams:
| Stream | Source | Label |
|---|---|---|
| User's voice | Microphone | [Me] |
| Counterpart's voice | System audio (Screen Recording permission on macOS / system audio on Windows) | [Other] |
Each audio chunk (~0.8 s) is sent to the server with a speaker label and is recognized independently. The result appears instantly in the LiveTranscriptStrip — the scrolling ticker at the bottom of the overlay.
Transcription Language
The recognition language is set at the session level:
- Open Settings in the client (gear icon) or in the web dashboard.
- Find the Transcription Language field.
- Select the desired language from the standard list of language codes (e.g.,
en,ru,zh,de). - Start a new session — the language will be applied to it.
Whisper supports more than 90 languages. If meeting participants speak different languages, Whisper auto-detects each chunk within the selected language "hint".
📸 [Screenshot: LiveTranscriptStrip scrolling ticker with [Me] and [Other] utterances]
Full Transcript in the Dashboard
After the session ends, the full transcript is available in the History section of the web dashboard. You can:
- read it filtered by speaker;
- copy it in full or in fragments;
- use it as the basis for analytics and mind maps.
📸 [Screenshot: session page in the dashboard — transcript blocks with speaker labels]
Common Errors
| Error | Cause | Fix |
|---|---|---|
| Counterpart's voice is not transcribed | Screen Recording permission not granted (macOS) or system audio unavailable (Windows) | macOS Permissions / Windows |
| Wrong language in the transcript | Incorrect transcription language selected | Change the language in settings and restart the session |
| Text gets mixed between speakers | Microphone captures both audio streams (echo) | Use headphones or lower speaker volume |
| No text on a weak connection | WebSocket drops before Whisper response arrives | Improve your connection; Whisperer reconnects automatically |
Best Practices
- Use headphones — this eliminates acoustic echo and improves speaker separation.
- Select the correct language before the session — changing the language mid-recording creates a new session.
- On bilingual calls you can select the counterpart's language: Whisper will still recognize your speech thanks to the context hint.