← Back
NVIDIA
NVIDIA unveils Nemotron 3 family with specialized agents for reasoning, voice, and safety
· releasemodelfeatureopen-source · developer.nvidia.com ↗

New Nemotron 3 Model Family

NVIDIA introduced the Nemotron 3 family of models at GTC 2026, designed as a unified agentic stack for building production-grade AI systems. The family includes:

  • Nemotron 3 Super: An open hybrid mixture-of-experts model optimized for long-context reasoning and multi-agent tasks
  • Nemotron 3 Ultra: Coming soon, targeting highest reasoning accuracy among open frontier models
  • Nemotron 3 Content Safety: Multimodal, multilingual content moderation
  • Nemotron 3 VoiceChat: Early access for low-latency, full-duplex voice interactions
  • Nemotron 3 Nano Omni: Coming soon, enterprise-grade multimodal understanding
  • Nemotron RAG models: Embedding and reranking models for multimodal retrieval

Nemotron 3 Super: Architecture and Performance

Nemotron 3 Super addresses key challenges in multi-agent systems: "context explosion" from massive token histories and the "thinking tax" from chain-of-thought reasoning. The model features:

  • Hybrid Mamba-Transformer MoE architecture with latent MoE that activates only 12B of 120B parameters per inference pass
  • 5x higher throughput than previous generation when running in NVFP4 precision on NVIDIA Blackwell GPUs
  • 1M-token context window enabling long-context reasoning and planning
  • Configurable thinking budget to keep latency and costs predictable during continuous agent workloads
  • Multi-token prediction and NVFP4 precision for improved efficiency

On the Artificial Analysis Intelligence Index for open-weight models under 250B parameters, Nemotron 3 Super NVFP4 ranks among top models, matching intelligence scores of leading alternatives while delivering superior throughput efficiency.

Developer Tools and Optimization

NVIDIA provides end-to-end tools and resources to build, evaluate, and optimize agentic systems:

  • NVIDIA NeMo: Open-source tools including the NeMo Evaluator for benchmarking and Agent Toolkit for building scalable systems
  • Open training recipes and data for fine-tuning and customization
  • Hugging Face integration: Models available on Hugging Face Hub for easy access and deployment

Developers can configure Nemotron 3 Super's thinking budget to balance reasoning depth with latency constraints, making it suitable for real-time multi-agent applications while maintaining cost predictability.

Available Now and Coming Soon

Nemotron 3 Super, Content Safety, VoiceChat (early access), and RAG models are available now on Hugging Face and NVIDIA platforms. Nemotron 3 Ultra and Nano Omni are coming soon, completing the unified agentic stack for specialized reasoning, safety, voice, and multimodal understanding tasks.