
Qwen3-4B
Text
Qwen 3-4B (4 B params, Apache-2.0)
Small-footprint transformer that punches above its weight.
Dual-mode reasoning. Toggle enable_thinking to get full chain-of-thought for hard math/code problems, or turn it off for fast, lightweight chat. One model, two personas.
Spec sheet. 36 layers, grouped-query attention (32 Q / 8 KV heads), native 32 k token window; stretch to 131 k with YaRN if you need it.
Runs almost anywhere. Full-precision needs ~8-16 GB of VRAM; 4-bit quant drops below 4 GB, so laptops and CPU boxes are fine.
What it’s good at. Strong logical reasoning, code generation, multilingual chat (100 + languages), and tool / agent calls—beats the older Qwen 2.5 line in human evals.
Plug-and-play. Works out-of-the-box with transformers >= 4.51, vLLM, SGLang, Ollama, llama.cpp, LMStudio, etc.—just from_pretrained("Qwen/Qwen3-4B") and go.
Why pick it for Norman AI?
You get open licensing, solid reasoning, long-context support, and minimal hardware demands—all in one model you can ship today without bending your infra.Add a Page with Content.
