← Back
NVIDIA
NVIDIA releases Nemotron 3 Super 120B model with 1M-token context for agentic AI applications
· releasemodelopen-sourcefeature · developer.nvidia.com ↗

Nemotron 3 Super: Purpose-Built for Agentic AI

NVIDIA has released Nemotron 3 Super, an open-source 120B parameter model specifically engineered to address challenges in multi-agent AI systems. The model uses a mixture-of-experts architecture with only 12B active parameters, enabling efficient deployment while maintaining high reasoning capacity for complex tasks like software development and cybersecurity analysis.

Key Architectural Innovations

Nemotron 3 Super introduces several novel technical approaches:

  • Latent MoE: Compresses tokens before reaching experts, enabling 4x as many expert specialists to activate at the same inference cost
  • Multi-token Prediction (MTP): Predicts multiple future tokens in a single forward pass, dramatically reducing generation time for long sequences and enabling built-in speculative decoding
  • Hybrid Mamba-Transformer Backbone: Combines Mamba state-space model layers for linear sequence processing efficiency with Transformer attention layers for precise reasoning, delivering 4x improved memory and compute efficiency
  • Native NVFP4 Pretraining: Optimized for NVIDIA Blackwell architecture, reducing memory requirements by 4x and accelerating inference 4x compared to FP8 on H100 hardware

Solving Multi-Agent System Challenges

The model directly addresses two critical problems in agentic AI: the "thinking tax" (expensive inference on every sub-task) and "context explosion" (agents accumulating 15x more tokens through repeated history, tool outputs, and reasoning steps).

With a native 1M-token context window, Nemotron 3 Super provides long-term memory for agents while maintaining alignment with original objectives across extended tasks. Multi-environment reinforcement learning post-training across 21 configurations with 1.2+ million rollouts optimizes performance for autonomous reasoning scenarios.

Performance and Availability

On PinchBench, a new benchmark for evaluating LLM performance as agent brains, Nemotron 3 Super scores 85.6% and ranks as the best open model in its class. The model is fully open-sourced with weights, datasets, and recipes available on Hugging Face, enabling developers to customize, optimize, and deploy on their own infrastructure.