Cloudflare adds NVIDIA Nemotron 3 Super to Workers AI with 50% faster token generation
NVIDIA Nemotron 3 Super Now Available on Workers AI
Cloudflare has partnered with NVIDIA to integrate Nemotron 3 Super into Workers AI, enabling developers to run high-performance AI agents at the edge. The model combines a hybrid Mamba-transformer architecture with a Mixture-of-Experts (MoE) design, delivering 120B total parameters while requiring only 12B active parameters per forward pass.
Key Features
- Faster Token Generation: The hybrid architecture delivers over 50% higher throughput compared to leading open models, significantly reducing latency for production applications
- Tool Calling: Native support for building agentic systems that can invoke tools across multiple conversation turns, essential for multi-step workflows
- Multi-Token Prediction (MTP): Predict multiple future tokens in a single forward pass, accelerating long-form text generation
- Extended Context: 32,000 token context window for maintaining conversation history and agent state across complex workflows
Getting Started
Developers can access Nemotron 3 Super through three interfaces:
- Workers AI binding: Use
env.AI.run()directly in Workers code - REST API: Call
/runor/v1/chat/completionsendpoints - OpenAI-compatible endpoint: Integrate with existing OpenAI-based tools and frameworks
The model is optimized for multi-agent architectures, making it ideal for applications requiring coordinated AI reasoning and tool orchestration at scale.