Nemotron-Cascade-2-30B-A3B

Text

Nemotron-Cascade-2-30B-A3B

High-performance reasoning and coding model optimized for efficient agent workflows.

  • Efficient MoE Design. A 30B parameter Mixture-of-Experts model that activates ~3B parameters per token. Delivers strong reasoning performance without the cost of full dense models.

  • Reasoning First. Trained with cascade RL and distillation to excel at math, logic, and code. Achieves top-tier results on benchmarks like IMO, AIME, and IOI.

  • Dual Mode Operation. Supports a configurable Thinking mode (with <think> reasoning traces) and a standard Instruct mode for faster responses when reasoning isn’t needed.

  • Built for Coding & Agents. Strong performance on competitive programming and software tasks. Works well in tool-based and agent loops (optimized for OpenHands-style setups).

  • Long Context Ready. Supports up to ~262k tokens, enabling multi-turn conversations and large context workflows without heavy degradation.

  • Simple Integration. Uses ChatML format, runs cleanly on vLLM, and supports tool calling without complex role handling.

Why pick it for Norman AI?

Nemotron-Cascade-2 is a strong “reasoning-first” model for startups that need real problem-solving ability without running 70B+ models. It’s a good fit for coding agents, technical assistants, and workflows where the model actually needs to think — not just autocomplete.

messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant",
     "content": "Sure! Here are some ways to eat bananas and dragonfruits together"},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

response = await norman.invoke(
    {
        "model_name": "nemotron-cascade-2-30b-a3b",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": messages
            }
        ]
    }
)
messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant",
     "content": "Sure! Here are some ways to eat bananas and dragonfruits together"},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

response = await norman.invoke(
    {
        "model_name": "nemotron-cascade-2-30b-a3b",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": messages
            }
        ]
    }
)
messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant",
     "content": "Sure! Here are some ways to eat bananas and dragonfruits together"},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

response = await norman.invoke(
    {
        "model_name": "nemotron-cascade-2-30b-a3b",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": messages
            }
        ]
    }
)