Overview
NVIDIA introduced the Nemotron 3 family of models at GTC 2026, designed as a unified stack for building production-grade agentic AI systems. The family addresses key challenges in multi-agent architectures, including context explosion, latency concerns, and safety guardrailing across modalities and languages.
Key Models in the Nemotron 3 Family
The Nemotron 3 lineup includes:
- Nemotron 3 Super: A 120B open hybrid Mamba-Transformer mixture-of-experts (MoE) model optimized for long-context reasoning and agentic tasks. It activates only 12B parameters per pass, achieving up to 5x higher throughput than previous generations.
- Nemotron 3 Ultra (coming soon): Positioned as the highest reasoning accuracy model among open frontier models.
- Nemotron 3 Content Safety: Multimodal, multilingual moderation model for safety guardrailing.
- Nemotron 3 VoiceChat (early access): Low-latency, full-duplex voice interaction model.
- Nemotron 3 Nano Omni (coming soon): Enterprise-grade multimodal understanding model.
- Nemotron RAG Models: Embedding and reranking models optimized for multimodal retrieval-augmented generation.
Technical Innovations
Nemotron 3 Super employs several optimizations for agentic AI workloads:
- Hybrid Architecture: Combines Mamba and Transformer layers with latent MoE to call four expert specialists at the inference cost of one.
- Precision Support: NVFP4 quantization on NVIDIA Blackwell GPUs delivers superior efficiency and memory footprint reduction.
- Context Window: 1M-token context window handles massive token histories from multi-agent systems.
- Configurable Thinking Budget: Developers can bound chain-of-thought reasoning to maintain predictable latency and cost.
- Multi-Token Prediction: Improves throughput and reasoning capabilities.
Performance & Benchmarks
External evaluations show Nemotron 3 Super NVFP4 ranks among the top open-weight models under 250B parameters on the Artificial Analysis Intelligence Index, matching leading alternatives while delivering significantly higher throughput per GPU.
Developer Tools & Resources
NVIDIA provides complementary tools through NVIDIA NeMo:
- NeMo Evaluator for robust benchmarking
- Agent Toolkit for end-to-end agentic AI optimization
- Open data and training recipes for building custom agentic systems
Getting Started
Models are available on Hugging Face and NVIDIA's Build platform. VoiceChat is available in early access through NVIDIA Build, while Ultra and Nano Omni models are coming soon.