Green Fern
Green Fern

gemma-2-2b

Text

Gemma-2 2B (≈2 B params, Gemma license)

Tiny transformer that stays within 8 k tokens yet delivers solid reasoning.

  • Spec sheet. Decoder-only, RoPE positional encodings, 8 192-token context window; pre-trained on ~2 T tokens of web, code & math text.

  • Punchy accuracy for its size. Scores 51.3 MMLU (5-shot), 73.0 HellaSwag (10-shot) and 77.8 PIQA (0-shot) — beating many 3-7 B open models.

  • Runs on almost any box. Float-16 weights sip ≈3.7 GB VRAM; int-4 quant fits on <2 GB, so laptops or low-end cloud GPUs are fine..

  • Fast path available. torch.compile can 6× your throughput once two warm-up calls finish.

  • Tool-ready. Drop-in with transformers, vLLM, llama.cpp, Ollama, or the lightweight local-gemma CLI—just from_pretrained("google/gemma-2-2b") and go.

Why pick it for Norman AI?

Gemma-2 2B gives us open weights, long-context chats, and sub-4 GB footprints—perfect for edge deployments, per-tenant fine-tunes, or a “budget” tier in our inference stack without sacrificing quality.


messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant",
     "content": "Sure! Here are some ways to eat bananas and dragonfruits together"},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

response = await norman.invoke(
    {
        "model_name": "gemma-2-2b",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": messages
            }
        ]
    }
)