
Whisper Medium
Audio
Whisper Medium (OpenAI)
Multilingual speech-to-text and translation that just works, even on messy audio.
Pre-train → scale. Trained end-to-end on a large, diverse mix of labeled and weakly labeled audio, covering many languages, accents, and recording conditions.
Broad coverage. Transcribes and translates speech across dozens of languages, with strong robustness to noise, accents, and real-world recording artifacts.
Bigger but tougher. ~769M parameters; heavier than wav2vec-based models, but far more forgiving on low-quality or non-studio audio.
Flexible modes. Supports speech-to-text in the source language and speech-to-English translation using the same model.
No lock-in. MIT license, single model file, 16 kHz audio input—no external language models or decoding tricks required.
Why pick it for Norman AI?
When accuracy matters more than size. Whisper Medium is ideal for multilingual transcripts, user-generated content, meetings, and interviews where audio quality varies—and you want consistent results without per-language tuning.
