
asr-wav2vec2-librispeech
Audio
wav2vec 2 LibriSpeech (317 M params, Apache-2.0)
All-English speech-to-text that scores under 2 % WER and still fits on a laptop.
Pre-train → fine-tune. Starts from the 317 M-param wav2vec 2-large-960h encoder, adds two DNN layers, then CTC-fine-tunes on the full 960 h LibriSpeech set.
Benchmark numbers. 1.90 % WER on test-clean and 3.96 % on test-other—plenty for production captions.
Lean deploys. FP16 weights ≈1.3 GB (fits any 2 GB GPU); 4-bit quant slides under 350 MB, so real-time CPU inference is doable.
No strings attached. Pure Apache-2.0 weights, single-language tokenizer, 16 kHz input—zero licensing drama, zero extra language models.
Why pick it for Norman AI?
Drop-in English ASR with top-tier accuracy and sub-2 GB footprints means instant call transcripts, voice-note search, or captioning—without new GPUs or legal hoops.
