Nemotron 3 Super: Open Model for Agentic AI
NVIDIA has released Nemotron 3 Super, a 120B total parameter model with 12B active parameters, designed specifically to power multi-agent AI systems. The model is fully open with open weights, datasets, and training recipes, enabling developers to customize and deploy it on their own infrastructure.
Key Technical Innovations
The model introduces several architectural advances to balance efficiency and reasoning capability:
- Hybrid Mamba-Transformer backbone: Interleaves Mamba-2 layers for efficient sequence processing with Transformer attention layers for precise reasoning, delivering 4x improved memory and compute efficiency
- Latent Mixture-of-Experts (MoE): Activates 4x more expert specialists at the same inference cost by compressing tokens before reaching experts
- Multi-Token Prediction (MTP): Predicts multiple future tokens in a single forward pass, reducing generation time and enabling built-in speculative decoding
- Native NVFP4 pretraining: Optimized for NVIDIA Blackwell hardware, cutting memory requirements and delivering 4x faster inference on B200 GPUs compared to FP8 on H100
- Multi-environment RL training: Post-trained across 21 environment configurations using NVIDIA NeMo tools with over 1.2 million rollouts
Solving Multi-Agent Challenges
Nemotron 3 Super directly addresses two critical challenges in autonomous AI systems. The "thinking tax"—where expensive reasoning models are called for every sub-task—is mitigated through the model's hybrid MoE architecture, which delivers over 5x throughput improvement. The "context explosion" problem, where multi-agent systems generate up to 15x more tokens than standard chats, is tackled with the model's native 1M-token context window, enabling long-term memory for aligned, high-accuracy reasoning.
Performance and Availability
On PinchBench, a benchmark for LLM performance as agent brains, Nemotron 3 Super scores 85.6% across the full test suite, making it the best open model in its class. The model is available now on Hugging Face with full documentation and tutorial resources for integration with OpenCode and other development platforms.