Green Fern
Green Fern

Phi-3-mini-4k-instruct

Text

Phi-3 Mini 4K (3 .8 B params, MIT)

Lean transformer, built for tight GPUs and quick reasoning.

  • Spec sheet. Dense decoder-only, 3 .8 B parameters, 4 K-token window. Post-trained with SFT + DPO so it follows instructions cleanly.

  • Hits above its size. Outscores many 7-13 B models on MMLU, BigBench-Hard, GSM-8K, and other reasoning tests.

  • Runs anywhere. Flash-Attention rockets on A100/H100; fall back to eager attention on older cards. Int-4 GGUF or ONNX builds drop onto CPUs, laptops, even phones, and DirectML handles Windows GPUs.

  • Tool-friendly. Ships in PyTorch, ONNX, GGUF; works out-of-the-box with transformers >= 4.41, vLLM, llama.cpp, Ollama, ONNX Runtime, etc.

  • Chat-ready. Simple <|system|>/<|user|>/<|assistant|> tags—no fancy prompt gymnastics needed.

Why pick it for Norman AI?

You get modern reasoning at laptop-level memory, permissive licensing, and plug-and-play builds that slot straight into our micro-service stack—perfect for edge endpoints or budget-conscious inference tiers.


messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant",
     "content": "Sure! Here are some ways to eat bananas and dragonfruits together"},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

response = await norman.invoke(
    {
        "model_name": "phi-4",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": messages
            }
        ]
    }
)