← Back
NVIDIA
NVIDIA releases Nemotron 3 Super, 120B open model optimized for agentic AI with 1M-token context
· releasemodelopen-sourcefeatureperformance · developer.nvidia.com ↗

Nemotron 3 Super: Open Foundation Model for Agentic AI

NVIDIA has released Nemotron 3 Super, a 120B parameter open-source model addressing key challenges in building scalable multi-agent AI systems. The model tackles two critical problems: the "thinking tax" (expensive reasoning overhead per sub-task) and "context explosion" (15x token generation in multi-turn agent interactions).

Key Technical Innovations

The model introduces several architectural advances:

  • Hybrid Mamba-Transformer MoE backbone: Interleaves Mamba-2 layers for linear-time sequence processing with Transformer attention layers for precise fact retrieval, combined with mixture-of-experts for parameter efficiency
  • Latent MoE: Compresses tokens before reaching experts, enabling 4x more expert specialists at identical inference cost
  • Multi-token prediction (MTP): Predicts multiple tokens per forward pass, reducing generation time and enabling built-in speculative decoding
  • Native NVFP4 pretraining: Optimized for NVIDIA Blackwell GPUs, cutting memory requirements and achieving 4x faster inference on B200 vs. FP8 on H100
  • 1M-token context window: Enables long-term agent memory for sustained reasoning without goal drift
  • Multi-environment RL training: Post-trained across 21 environment configurations with 1.2M environment rollouts

Performance and Availability

On PinchBench (a benchmark for LLM-driven autonomous agents), Nemotron 3 Super achieves 85.6% scores—the best performance among open models in its class. The model delivers over 5x throughput compared to the previous Nemotron Super variant.

Availability: The model is fully open with open weights, datasets, and training recipes. Developers can access it via Hugging Face and integrate it into their own infrastructure. NVIDIA provides tutorial resources and integration support for platforms like OpenCode and Perplexity.

Action Items for Developers

  • Download the model from Hugging Face
  • Review the technical blog and tutorial videos for deployment guidance
  • Test on your multi-agent workflows to evaluate throughput and accuracy improvements