
asr-wav2vec2-librispeech
Audio
ASR Wav2Vec2 LibriSpeech (SpeechBrain)
All-English speech-to-text with near-human accuracy that still runs on a laptop.
Pre‑train → fine‑tune. Starts from the wav2vec 2.0 large (960h) encoder, adds a lightweight CTC head, and fine‑tunes on the full 960‑hour LibriSpeech dataset.
Benchmark numbers. ~1.9 % WER on test‑clean and ~4.0 % on test‑other—strong enough for production captions and transcripts.
Lean deploys. FP16 weights are ~1.3 GB (fits on a 2 GB GPU). With quantization, the model can run on CPU, enabling real‑time or near‑real‑time inference.
No strings attached. Apache‑2.0 license, English‑only tokenizer, 16 kHz mono audio input—no licensing friction, no extra language models.
Why pick it for Norman AI?
A reliable default for English ASR. High accuracy, predictable behavior, and modest hardware needs make it ideal for call transcripts, voice‑note search, meeting captions, and other speech‑to‑text workflows—without new GPUs or legal overhead.
