

Nemotron-Cascade-2-30B-A3B
Text
Nemotron-Cascade-2-30B-A3B
High-performance reasoning and coding model optimized for efficient agent workflows.
Efficient MoE Design. A 30B parameter Mixture-of-Experts model that activates ~3B parameters per token. Delivers strong reasoning performance without the cost of full dense models.
Reasoning First. Trained with cascade RL and distillation to excel at math, logic, and code. Achieves top-tier results on benchmarks like IMO, AIME, and IOI.
Dual Mode Operation. Supports a configurable Thinking mode (with
<think>reasoning traces) and a standard Instruct mode for faster responses when reasoning isn’t needed.Built for Coding & Agents. Strong performance on competitive programming and software tasks. Works well in tool-based and agent loops (optimized for OpenHands-style setups).
Long Context Ready. Supports up to ~262k tokens, enabling multi-turn conversations and large context workflows without heavy degradation.
Simple Integration. Uses ChatML format, runs cleanly on vLLM, and supports tool calling without complex role handling.
Why pick it for Norman AI?
Nemotron-Cascade-2 is a strong “reasoning-first” model for startups that need real problem-solving ability without running 70B+ models. It’s a good fit for coding agents, technical assistants, and workflows where the model actually needs to think — not just autocomplete.