NVIDIA unveils Nemotron 3 family with specialized agents for reasoning, voice, and safety

New Nemotron 3 Model Family

NVIDIA introduced the Nemotron 3 family of models at GTC 2026, designed as a unified agentic stack for building production-grade AI systems. The family includes:

Nemotron 3 Super: An open hybrid mixture-of-experts model optimized for long-context reasoning and multi-agent tasks
Nemotron 3 Ultra: Coming soon, targeting highest reasoning accuracy among open frontier models
Nemotron 3 Content Safety: Multimodal, multilingual content moderation
Nemotron 3 VoiceChat: Early access for low-latency, full-duplex voice interactions
Nemotron 3 Nano Omni: Coming soon, enterprise-grade multimodal understanding
Nemotron RAG models: Embedding and reranking models for multimodal retrieval

Nemotron 3 Super: Architecture and Performance

Nemotron 3 Super addresses key challenges in multi-agent systems: "context explosion" from massive token histories and the "thinking tax" from chain-of-thought reasoning. The model features:

Hybrid Mamba-Transformer MoE architecture with latent MoE that activates only 12B of 120B parameters per inference pass
5x higher throughput than previous generation when running in NVFP4 precision on NVIDIA Blackwell GPUs
1M-token context window enabling long-context reasoning and planning
Configurable thinking budget to keep latency and costs predictable during continuous agent workloads
Multi-token prediction and NVFP4 precision for improved efficiency

On the Artificial Analysis Intelligence Index for open-weight models under 250B parameters, Nemotron 3 Super NVFP4 ranks among top models, matching intelligence scores of leading alternatives while delivering superior throughput efficiency.

Developer Tools and Optimization

NVIDIA provides end-to-end tools and resources to build, evaluate, and optimize agentic systems:

NVIDIA NeMo: Open-source tools including the NeMo Evaluator for benchmarking and Agent Toolkit for building scalable systems
Open training recipes and data for fine-tuning and customization
Hugging Face integration: Models available on Hugging Face Hub for easy access and deployment

Developers can configure Nemotron 3 Super's thinking budget to balance reasoning depth with latency constraints, making it suitable for real-time multi-agent applications while maintaining cost predictability.

Available Now and Coming Soon

Nemotron 3 Super, Content Safety, VoiceChat (early access), and RAG models are available now on Hugging Face and NVIDIA platforms. Nemotron 3 Ultra and Nano Omni are coming soon, completing the unified agentic stack for specialized reasoning, safety, voice, and multimodal understanding tasks.

New Nemotron 3 Model Family

Nemotron 3 Super: Architecture and Performance

Developer Tools and Optimization

Available Now and Coming Soon

Tags

Published

Source