Nemotron 3 Super: Purpose-Built for Agentic AI
NVIDIA has released Nemotron 3 Super, an open-source 120B parameter model specifically engineered to address challenges in multi-agent AI systems. The model uses a mixture-of-experts architecture with only 12B active parameters, enabling efficient deployment while maintaining high reasoning capacity for complex tasks like software development and cybersecurity analysis.
Key Architectural Innovations
Nemotron 3 Super introduces several novel technical approaches:
- Latent MoE: Compresses tokens before reaching experts, enabling 4x as many expert specialists to activate at the same inference cost
- Multi-token Prediction (MTP): Predicts multiple future tokens in a single forward pass, dramatically reducing generation time for long sequences and enabling built-in speculative decoding
- Hybrid Mamba-Transformer Backbone: Combines Mamba state-space model layers for linear sequence processing efficiency with Transformer attention layers for precise reasoning, delivering 4x improved memory and compute efficiency
- Native NVFP4 Pretraining: Optimized for NVIDIA Blackwell architecture, reducing memory requirements by 4x and accelerating inference 4x compared to FP8 on H100 hardware
Solving Multi-Agent System Challenges
The model directly addresses two critical problems in agentic AI: the "thinking tax" (expensive inference on every sub-task) and "context explosion" (agents accumulating 15x more tokens through repeated history, tool outputs, and reasoning steps).
With a native 1M-token context window, Nemotron 3 Super provides long-term memory for agents while maintaining alignment with original objectives across extended tasks. Multi-environment reinforcement learning post-training across 21 configurations with 1.2+ million rollouts optimizes performance for autonomous reasoning scenarios.
Performance and Availability
On PinchBench, a new benchmark for evaluating LLM performance as agent brains, Nemotron 3 Super scores 85.6% and ranks as the best open model in its class. The model is fully open-sourced with weights, datasets, and recipes available on Hugging Face, enabling developers to customize, optimize, and deploy on their own infrastructure.