
Phi-3-mini-4k-instruct
Text
Phi-3 Mini 4K (3 .8 B params, MIT)
Lean transformer, built for tight GPUs and quick reasoning.
Spec sheet. Dense decoder-only, 3 .8 B parameters, 4 K-token window. Post-trained with SFT + DPO so it follows instructions cleanly.
Hits above its size. Outscores many 7-13 B models on MMLU, BigBench-Hard, GSM-8K, and other reasoning tests.
Runs anywhere. Flash-Attention rockets on A100/H100; fall back to eager attention on older cards. Int-4 GGUF or ONNX builds drop onto CPUs, laptops, even phones, and DirectML handles Windows GPUs.
Tool-friendly. Ships in PyTorch, ONNX, GGUF; works out-of-the-box with transformers >= 4.41, vLLM, llama.cpp, Ollama, ONNX Runtime, etc.
Chat-ready. Simple <|system|>/<|user|>/<|assistant|> tags—no fancy prompt gymnastics needed.
Why pick it for Norman AI?
You get modern reasoning at laptop-level memory, permissive licensing, and plug-and-play builds that slot straight into our micro-service stack—perfect for edge endpoints or budget-conscious inference tiers.
