

NVIDIA-Nemotron-3-Super-120B-A12B (NVFP4)
Text
NVIDIA-Nemotron-3-Super-120B-A12B (NVFP4)
High-throughput agentic reasoning model built for massive context and complex multi-step tasks.
Hybrid Efficiency. A 120B parameter model that only activates 12B parameters per token. It combines Transformer attention with Mamba-2 (SSM) and Mixture-of-Experts (MoE) to deliver "dense" level intelligence at much higher speeds.
Massive 1M Context. Specifically engineered to handle ultra-long documents and long-horizon agent history without the typical performance "drift" or "context explosion" found in standard models.
Native 4-bit Precision. Unlike models quantized after training, this was trained from scratch in NVFP4. This allows it to run with significantly lower VRAM and higher throughput on NVIDIA Blackwell/Hopper hardware with almost zero accuracy loss.
Built for Agents. Optimized for tool-calling, code generation, and multi-agent orchestration. It features a configurable "Thinking" mode for deep reasoning traces before providing final answers.
Speed-Focused. Incorporates Multi-Token Prediction (MTP), allowing it to predict multiple tokens at once to accelerate speculative decoding and reduce latency in production.
Why pick it for Norman AI?
Nemotron-3-Super is the ultimate "workhorse" for complex AI agents. It is the right choice when you need a model that can maintain logic over 100k+ tokens of context or when you are building high-volume systems (like automated support or coding assistants) that require the reasoning power of a 120B model with the speed and cost-profile of a much smaller one.