Transcript and Speakers

Whisperer transcribes speech in real time using Whisper — OpenAI's multilingual model. Speaker diarization runs simultaneously: the user's words and the counterpart's words are labeled differently, making the transcript easier to read and improving the accuracy of AI responses.

When to Read This

Read this article to understand:

  • how to configure the recognition language;
  • why some utterances are labeled [Me] and others [Other];
  • which languages are supported and how to switch between them.

How Transcription Works

Whisperer captures two audio streams:

Stream Source Label
User's voice Microphone [Me]
Counterpart's voice System audio (Screen Recording permission on macOS / system audio on Windows) [Other]

Each audio chunk (~0.8 s) is sent to the server with a speaker label and is recognized independently. The result appears instantly in the LiveTranscriptStrip — the scrolling ticker at the bottom of the overlay.

Transcription Language

The recognition language is set at the session level:

  1. Open Settings in the client (gear icon) or in the web dashboard.
  2. Find the Transcription Language field.
  3. Select the desired language from the standard list of language codes (e.g., en, ru, zh, de).
  4. Start a new session — the language will be applied to it.

Whisper supports more than 90 languages. If meeting participants speak different languages, Whisper auto-detects each chunk within the selected language "hint".

📸 [Screenshot: LiveTranscriptStrip scrolling ticker with [Me] and [Other] utterances]

Full Transcript in the Dashboard

After the session ends, the full transcript is available in the History section of the web dashboard. You can:

  • read it filtered by speaker;
  • copy it in full or in fragments;
  • use it as the basis for analytics and mind maps.

📸 [Screenshot: session page in the dashboard — transcript blocks with speaker labels]

Common Errors

Error Cause Fix
Counterpart's voice is not transcribed Screen Recording permission not granted (macOS) or system audio unavailable (Windows) macOS Permissions / Windows
Wrong language in the transcript Incorrect transcription language selected Change the language in settings and restart the session
Text gets mixed between speakers Microphone captures both audio streams (echo) Use headphones or lower speaker volume
No text on a weak connection WebSocket drops before Whisper response arrives Improve your connection; Whisperer reconnects automatically

Best Practices

  • Use headphones — this eliminates acoustic echo and improves speaker separation.
  • Select the correct language before the session — changing the language mid-recording creates a new session.
  • On bilingual calls you can select the counterpart's language: Whisper will still recognize your speech thanks to the context hint.

Related Articles