NVIDIA releases Nemotron 3 Super, 120B open model achieving 85.6% on agent reasoning benchmarks

Addressing Multi-Agent AI Challenges

NVIDIA released Nemotron 3 Super to solve critical bottlenecks in agentic AI systems. Multi-agent applications face two major problems: "context explosion" (up to 15x more tokens due to repeated history and tool outputs), which causes goal drift over long tasks, and the "thinking tax" (expensive reasoning models for every sub-task), which makes applications too slow and costly for practical deployment.

Architecture and Key Features

Nemotron 3 Super combines several architectural innovations:

Hybrid Mamba-Transformer backbone: Integrates Mamba state-space model layers for efficient sequence processing with Transformer attention layers for precise reasoning, delivering 4x improved memory and compute efficiency
Latent MoE (Mixture of Experts): Compresses tokens before routing to experts, enabling 4x as many specialist experts for the same inference cost
Multi-Token Prediction (MTP): Predicts multiple tokens per forward pass, reducing generation time and enabling built-in speculative decoding
Native NVFP4 pretraining: Optimized for NVIDIA Blackwell architecture, cutting memory requirements and achieving 4x faster inference on B200 vs. H100
1M-token context window: Enables agents to retain long-term memory and maintain task alignment across extended operations
Multi-environment RL post-training: Fine-tuned across 21 environment configurations using NeMo Gym and NeMo RL with 1.2+ million environment rollouts

Performance and Availability

The model achieves 85.6% on PinchBench, a new benchmark for autonomous agent performance, ranking it as the best open model in its class. Importantly, Nemotron 3 Super is fully open-source with open weights, datasets, and recipes, allowing developers to customize, optimize, and deploy on their own infrastructure.

The 120B total parameter model uses only 12B active parameters per token, balancing capability with efficiency—ideal for continuous deployment in software development, cybersecurity, and autonomous reasoning applications. Weights are available on Hugging Face, with comprehensive tutorials and integration guides provided for platforms like OpenCode and Build.nvidia.com.

Addressing Multi-Agent AI Challenges

Architecture and Key Features

Performance and Availability

Tags

Published

Source