New Nemotron 3 Model Family
NVIDIA introduced the Nemotron 3 family of models at GTC 2026, designed as a unified agentic stack for building production-grade AI systems. The family includes:
- Nemotron 3 Super: An open hybrid mixture-of-experts model optimized for long-context reasoning and multi-agent tasks
- Nemotron 3 Ultra: Coming soon, targeting highest reasoning accuracy among open frontier models
- Nemotron 3 Content Safety: Multimodal, multilingual content moderation
- Nemotron 3 VoiceChat: Early access for low-latency, full-duplex voice interactions
- Nemotron 3 Nano Omni: Coming soon, enterprise-grade multimodal understanding
- Nemotron RAG models: Embedding and reranking models for multimodal retrieval
Nemotron 3 Super: Architecture and Performance
Nemotron 3 Super addresses key challenges in multi-agent systems: "context explosion" from massive token histories and the "thinking tax" from chain-of-thought reasoning. The model features:
- Hybrid Mamba-Transformer MoE architecture with latent MoE that activates only 12B of 120B parameters per inference pass
- 5x higher throughput than previous generation when running in NVFP4 precision on NVIDIA Blackwell GPUs
- 1M-token context window enabling long-context reasoning and planning
- Configurable thinking budget to keep latency and costs predictable during continuous agent workloads
- Multi-token prediction and NVFP4 precision for improved efficiency
On the Artificial Analysis Intelligence Index for open-weight models under 250B parameters, Nemotron 3 Super NVFP4 ranks among top models, matching intelligence scores of leading alternatives while delivering superior throughput efficiency.
Developer Tools and Optimization
NVIDIA provides end-to-end tools and resources to build, evaluate, and optimize agentic systems:
- NVIDIA NeMo: Open-source tools including the NeMo Evaluator for benchmarking and Agent Toolkit for building scalable systems
- Open training recipes and data for fine-tuning and customization
- Hugging Face integration: Models available on Hugging Face Hub for easy access and deployment
Developers can configure Nemotron 3 Super's thinking budget to balance reasoning depth with latency constraints, making it suitable for real-time multi-agent applications while maintaining cost predictability.
Available Now and Coming Soon
Nemotron 3 Super, Content Safety, VoiceChat (early access), and RAG models are available now on Hugging Face and NVIDIA platforms. Nemotron 3 Ultra and Nano Omni are coming soon, completing the unified agentic stack for specialized reasoning, safety, voice, and multimodal understanding tasks.