
pythia-2.8b
Text
Pythia-2.8B (2.8 B params, Apache-2.0)
EleutherAI’s open, research-first transformer; small enough for one card, good enough to beat OPT/GPT-Neo peers.
Spec sheet. 32 layers · 2 560 hidden size · 32 heads · 2 048-token context; trained on The Pile with the GPT-NeoX codebase.
Baseline accuracy. Outperforms similar-sized OPT/GPT-Neo: 60.7 HellaSwag, 36.3 ARC-25, 26.8 MMLU (5-shot).
Runs light. FP16 weights need ≈ 6 GB VRAM; 8-bit ≈ 3 GB; 4-bit ≈ 1 GB—laptops or single A10G are fine.
Tool-ready. Drop-in with transformers (GPT-NeoX class), vLLM, llama.cpp (GGUF), Ollama, etc.—from_pretrained("EleutherAI/pythia-2.8b") and go.
Hack-friendly. 154 training checkpoints expose the model’s entire learning trajectory—great for interpretability or custom fine-tunes.
Why pick it for Norman AI?
Apache license, tiny VRAM needs, and a full training timeline make Pythia-2.8B the perfect sandbox for fast experiments—whether you’re prototyping edge demos, benchmarking quantization pipelines, or digging into model internals without burning GPUs.
