← Back
NVIDIA
NVIDIA releases Nemotron 3 Super, a 120B open model with 1M-token context for agentic AI
· releasemodelopen-sourcefeature · developer.nvidia.com ↗

Nemotron 3 Super: A Purpose-Built Model for Agentic AI

NVIDIA has open-sourced Nemotron 3 Super, a 120B parameter model (12B active parameters) specifically designed to address the operational challenges of multi-agent AI systems. The model tackles two key problems in agentic reasoning: the "thinking tax" of running expensive reasoning models for every sub-task, and "context explosion," where agents lose alignment over long task sequences due to accumulated history.

Key Architectural Innovations

The model introduces several cutting-edge techniques to balance efficiency and accuracy:

  • Latent MoE: Compresses tokens before routing to experts, enabling 4x more expert specialists for the same inference cost
  • Multi-Token Prediction (MTP): Predicts multiple future tokens in a single forward pass, reducing generation time for long sequences and enabling built-in speculative decoding
  • Hybrid Mamba-Transformer Backbone: Combines Mamba layers for linear-time sequence processing with Transformer layers for precision reasoning, delivering 4x improved memory and compute efficiency
  • Native NVFP4 Pretraining: Optimized for NVIDIA Blackwell, achieving 4x faster inference on B200 vs. FP8 on H100 while maintaining accuracy
  • Multi-Environment Reinforcement Learning: Post-trained using NVIDIA NeMo tools across 21 environment configurations with 1.2+ million environment rollouts

Performance and Availability

The model achieves 85.6% on PinchBench—a new benchmark for evaluating LLMs as agentic brains—making it the best-performing open model in its class. It delivers 5x throughput improvements over the previous Nemotron Super while maintaining a native 1M-token context window for long-term agentic memory.

Nemotron 3 Super is fully open with open weights, datasets, and recipes available on Hugging Face, enabling developers to customize, optimize, and deploy it on their own infrastructure. This positions it as a practical choice for autonomous agents in software development, cybersecurity triaging, and other reasoning-heavy applications.