NVIDIA releases Nemotron 3 Super, an open 120B MoE model for agentic AI with 1M-token context

Nemotron 3 Super: Open Model for Agentic AI

NVIDIA has released Nemotron 3 Super, a 120B total parameter model with 12B active parameters, designed specifically to power multi-agent AI systems. The model is fully open with open weights, datasets, and training recipes, enabling developers to customize and deploy it on their own infrastructure.

Key Technical Innovations

The model introduces several architectural advances to balance efficiency and reasoning capability:

Hybrid Mamba-Transformer backbone: Interleaves Mamba-2 layers for efficient sequence processing with Transformer attention layers for precise reasoning, delivering 4x improved memory and compute efficiency
Latent Mixture-of-Experts (MoE): Activates 4x more expert specialists at the same inference cost by compressing tokens before reaching experts
Multi-Token Prediction (MTP): Predicts multiple future tokens in a single forward pass, reducing generation time and enabling built-in speculative decoding
Native NVFP4 pretraining: Optimized for NVIDIA Blackwell hardware, cutting memory requirements and delivering 4x faster inference on B200 GPUs compared to FP8 on H100
Multi-environment RL training: Post-trained across 21 environment configurations using NVIDIA NeMo tools with over 1.2 million rollouts

Solving Multi-Agent Challenges

Nemotron 3 Super directly addresses two critical challenges in autonomous AI systems. The "thinking tax"—where expensive reasoning models are called for every sub-task—is mitigated through the model's hybrid MoE architecture, which delivers over 5x throughput improvement. The "context explosion" problem, where multi-agent systems generate up to 15x more tokens than standard chats, is tackled with the model's native 1M-token context window, enabling long-term memory for aligned, high-accuracy reasoning.

Performance and Availability

On PinchBench, a benchmark for LLM performance as agent brains, Nemotron 3 Super scores 85.6% across the full test suite, making it the best open model in its class. The model is available now on Hugging Face with full documentation and tutorial resources for integration with OpenCode and other development platforms.

Nemotron 3 Super: Open Model for Agentic AI

Key Technical Innovations

Solving Multi-Agent Challenges

Performance and Availability

Tags

Published

Source