Cloudflare Workers AI now supports NVIDIA Nemotron 3 Super, a 120B parameter MoE model
NVIDIA Nemotron 3 Super Now Available on Workers AI
Cloudflare has partnered with NVIDIA to integrate the Nemotron 3 Super model into Workers AI. This is a Mixture-of-Experts (MoE) model with a hybrid Mamba-transformer architecture, featuring 120B total parameters with 12B active parameters per forward pass for efficient inference.
Key Capabilities
- Over 50% higher token generation throughput compared to leading open-source models, significantly reducing latency in production applications
- Tool calling support for building AI agents that invoke external tools across multiple conversation turns
- Multi-Token Prediction (MTP) for accelerated long-form text generation by predicting multiple future tokens in a single forward pass
- 32,000 token context window to maintain conversation history and plan states across complex multi-step workflows
Accessing Nemotron 3 Super
The model is available through three interfaces:
- Workers AI binding (
env.AI.run()) for Workers serverless functions - REST API for direct HTTP integration
- OpenAI-compatible endpoint for drop-in compatibility with existing OpenAI client libraries
The model identifier is @cf/nvidia/nemotron-3-120b-a12b. Developers building multi-agent systems, reasoning-heavy applications, or complex task orchestration workflows will benefit from its optimized architecture for handling many collaborating agents per application.