NVIDIA releases Nemotron 3 Super, 120B open model optimized for agentic AI with 1M-token context

Nemotron 3 Super: Open Foundation Model for Agentic AI

NVIDIA has released Nemotron 3 Super, a 120B parameter open-source model addressing key challenges in building scalable multi-agent AI systems. The model tackles two critical problems: the "thinking tax" (expensive reasoning overhead per sub-task) and "context explosion" (15x token generation in multi-turn agent interactions).

Key Technical Innovations

The model introduces several architectural advances:

Hybrid Mamba-Transformer MoE backbone: Interleaves Mamba-2 layers for linear-time sequence processing with Transformer attention layers for precise fact retrieval, combined with mixture-of-experts for parameter efficiency
Latent MoE: Compresses tokens before reaching experts, enabling 4x more expert specialists at identical inference cost
Multi-token prediction (MTP): Predicts multiple tokens per forward pass, reducing generation time and enabling built-in speculative decoding
Native NVFP4 pretraining: Optimized for NVIDIA Blackwell GPUs, cutting memory requirements and achieving 4x faster inference on B200 vs. FP8 on H100
1M-token context window: Enables long-term agent memory for sustained reasoning without goal drift
Multi-environment RL training: Post-trained across 21 environment configurations with 1.2M environment rollouts

Performance and Availability

On PinchBench (a benchmark for LLM-driven autonomous agents), Nemotron 3 Super achieves 85.6% scores—the best performance among open models in its class. The model delivers over 5x throughput compared to the previous Nemotron Super variant.

Availability: The model is fully open with open weights, datasets, and training recipes. Developers can access it via Hugging Face and integrate it into their own infrastructure. NVIDIA provides tutorial resources and integration support for platforms like OpenCode and Perplexity.

Action Items for Developers

Download the model from Hugging Face
Review the technical blog and tutorial videos for deployment guidance
Test on your multi-agent workflows to evaluate throughput and accuracy improvements

Nemotron 3 Super: Open Foundation Model for Agentic AI

Key Technical Innovations

Performance and Availability

Action Items for Developers

Tags

Published

Source