NVIDIA-Nemotron-3-Super-120B-A12B (NVFP4)

Text

NVIDIA-Nemotron-3-Super-120B-A12B (NVFP4)

High-throughput agentic reasoning model built for massive context and complex multi-step tasks.

  • Hybrid Efficiency. A 120B parameter model that only activates 12B parameters per token. It combines Transformer attention with Mamba-2 (SSM) and Mixture-of-Experts (MoE) to deliver "dense" level intelligence at much higher speeds.

  • Massive 1M Context. Specifically engineered to handle ultra-long documents and long-horizon agent history without the typical performance "drift" or "context explosion" found in standard models.

  • Native 4-bit Precision. Unlike models quantized after training, this was trained from scratch in NVFP4. This allows it to run with significantly lower VRAM and higher throughput on NVIDIA Blackwell/Hopper hardware with almost zero accuracy loss.

  • Built for Agents. Optimized for tool-calling, code generation, and multi-agent orchestration. It features a configurable "Thinking" mode for deep reasoning traces before providing final answers.

  • Speed-Focused. Incorporates Multi-Token Prediction (MTP), allowing it to predict multiple tokens at once to accelerate speculative decoding and reduce latency in production.

Why pick it for Norman AI?

Nemotron-3-Super is the ultimate "workhorse" for complex AI agents. It is the right choice when you need a model that can maintain logic over 100k+ tokens of context or when you are building high-volume systems (like automated support or coding assistants) that require the reasoning power of a 120B model with the speed and cost-profile of a much smaller one.

messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant",
     "content": "Sure! Here are some ways to eat bananas and dragonfruits together"},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

response = await norman.invoke(
    {
        "model_name": "granite-4.0-micro",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": messages
            }
        ]
    }
)
messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant",
     "content": "Sure! Here are some ways to eat bananas and dragonfruits together"},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

response = await norman.invoke(
    {
        "model_name": "granite-4.0-micro",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": messages
            }
        ]
    }
)
messages = [
    {"role": "system", "content": "You are a helpful AI assistant."},
    {"role": "user", "content": "Can you provide ways to eat combinations of bananas and dragonfruits?"},
    {"role": "assistant",
     "content": "Sure! Here are some ways to eat bananas and dragonfruits together"},
    {"role": "user", "content": "What about solving an 2x + 3 = 7 equation?"},
]

response = await norman.invoke(
    {
        "model_name": "granite-4.0-micro",
        "inputs": [
            {
                "display_title": "Prompt",
                "data": messages
            }
        ]
    }
)