Green Fern
Green Fern

asr-wav2vec2-librispeech

Audio

wav2vec 2 LibriSpeech (317 M params, Apache-2.0)

All-English speech-to-text that scores under 2 % WER and still fits on a laptop.

  • Pre-train → fine-tune. Starts from the 317 M-param wav2vec 2-large-960h encoder, adds two DNN layers, then CTC-fine-tunes on the full 960 h LibriSpeech set.

  • Benchmark numbers. 1.90 % WER on test-clean and 3.96 % on test-other—plenty for production captions.

  • Lean deploys. FP16 weights ≈1.3 GB (fits any 2 GB GPU); 4-bit quant slides under 350 MB, so real-time CPU inference is doable.

  • No strings attached. Pure Apache-2.0 weights, single-language tokenizer, 16 kHz input—zero licensing drama, zero extra language models.

Why pick it for Norman AI?

Drop-in English ASR with top-tier accuracy and sub-2 GB footprints means instant call transcripts, voice-note search, or captioning—without new GPUs or legal hoops.

messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant",
     "content": "Sure! Here are some ways to eat bananas and dragonfruits together"},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

response = await norman.invoke(
    {
        "model_name": "phi-4",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": messages
            }
        ]
    }
)