Green Fern
Green Fern

Whisper Large-v3

Audio

Whisper Large-v3 (1.55 B params, Apache-2.0)

OpenAI’s top-tier speech-to-text model—more accurate, still fits on one card.

  • Spec sheet. Same core as Large-v2 but with 128-Mel inputs and a new Cantonese token; trained on 5 M h weak-labeled + 4 M h pseudo-labeled audio, 30 s receptive field.

  • Best-in-class accuracy. Drops word-error-rate by 10-20 % vs Large-v2; ~2 % WER on LibriSpeech clean, ~3.9 % on test-other.

  • Runs on modest GPUs. FP16 weights ≈3 GB; expect 6–10 GB VRAM for real-time use, or squeeze to 4 GB with 4-bit/ggml builds.

  • Full multilingual + translation. Auto-detects 99 languages and can output English translations out-of-the-box.

Why pick it for Norman AI?

State-of-the-art transcription and on-the-fly translation for calls, podcasts, or video captions—no extra infra, permissive license, and small enough to co-host with your other micro-services on a single A10G.

messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant",
     "content": "Sure! Here are some ways to eat bananas and dragonfruits together"},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

response = await norman.invoke(
    {
        "model_name": "phi-4",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": messages
            }
        ]
    }
)


See our SDK