NVIDIA releases Nemotron 3 Super 120B model with 12B active parameters and 1M-token context for agentic AI

Overview

NVIDIA has released Nemotron 3 Super, a 120B total parameter model with only 12B active parameters per token, designed specifically for multi-agent AI applications. The open-weight model addresses two critical challenges in agentic AI: the "thinking tax" of processing massive amounts of context and the "context explosion" that causes alignment drift in long-running autonomous systems.

Key Architectural Innovations

The model introduces several technical advances:

Latent Mixture-of-Experts (MoE): Compresses tokens before expert routing, allowing 4x more specialist experts to activate without increasing inference cost
Multi-Token Prediction (MTP): Predicts multiple future tokens in a single forward pass, reducing generation latency and enabling built-in speculative decoding
Hybrid Mamba-Transformer backbone: Combines Mamba layers for linear-time sequence processing with Transformer attention layers for precise retrieval, achieving 4x better memory and compute efficiency
Native NVFP4 pretraining: Optimized for NVIDIA Blackwell hardware, cutting memory requirements and speeding up inference 4x on B200 vs. FP8 on H100
Multi-environment reinforcement learning: Post-trained using NVIDIA NeMo tools across 21 environment configurations with over 1.2 million environment rollouts

Performance & Availability

Nemotron 3 Super delivers 5x throughput improvement over the previous Nemotron Super and supports a native 1M-token context window, enabling agents to maintain long-term memory without goal drift. On the PinchBench benchmark for agentic reasoning, it scores 85.6%, the highest among open-source models in its class.

The model is fully open-source with weights, datasets, and recipes available, allowing developers to customize, optimize, and deploy on their own infrastructure. NVIDIA provides tutorial videos and integration examples through Build.NVIDIA.com and OpenCode platforms for immediate hands-on access.

Overview

Key Architectural Innovations

Performance & Availability

Tags

Published

Source