← Back
NVIDIA
NVIDIA releases Nemotron 3 Super, open 120B model optimized for agentic AI with 1M-token context
· releasemodelopen-sourcefeature · developer.nvidia.com ↗

Key Features

NVIDIA's Nemotron 3 Super introduces a specialized architecture for agentic AI systems that need to solve dense technical problems autonomously. The model addresses two critical challenges in multi-agent applications: context explosion (where agents regenerate full history at each turn, causing token overhead up to 15x standard chat) and the thinking tax (expensive inference costs when running large reasoning models for every sub-task).

Architectural Innovations

The model combines several technical advances:

  • Latent MoE: Compresses tokens before routing to experts, enabling 4x more specialist experts for the same inference cost
  • Multi-Token Prediction (MTP): Predicts multiple future tokens in a single forward pass, reducing generation time and enabling built-in speculative decoding
  • Hybrid Mamba-Transformer Backbone: Integrates Mamba-2 layers for linear-time sequence processing with Transformer attention layers for precise reasoning, delivering 4x improved memory and compute efficiency
  • Native NVFP4 Pretraining: Optimized for NVIDIA Blackwell, enabling 4x faster inference on B200 vs. FP8 on H100 while maintaining model accuracy
  • Multi-Environment RL Post-Training: Trained with 1.2+ million environment rollouts across 21 configurations using NVIDIA NeMo Gym and NeMo RL frameworks

Performance & Availability

The 1M-token native context window allows agents to maintain long-term memory and alignment throughout extended reasoning tasks. On PinchBench, a new benchmark for LLM agents as OpenClaw brains, Nemotron 3 Super achieves 85.6% accuracy, the highest score for open models in its class.

The model is fully open-sourced with open weights, datasets, and recipes, available on Hugging Face. Developers can download, customize, optimize, and deploy on their own infrastructure without restrictions. Tutorial videos and integration guides for Perplexity and OpenCode are available on NVIDIA Developer resources.