NVIDIA Nemotron 3 Super now available on Cloudflare Workers AI with 50% faster token generation
NVIDIA Nemotron 3 Super Now Available on Workers AI
Cloudflare has partnered with NVIDIA to bring the Nemotron 3 Super model to Workers AI. The model is now available for immediate use through the @cf/nvidia/nemotron-3-120b-a12b identifier.
Model Architecture and Specifications
Nemotron 3 Super is a Mixture-of-Experts (MoE) model with a hybrid Mamba-transformer architecture. While the model contains 120B total parameters, only 12B parameters are active per forward pass, enabling efficient inference. The 32,000 token context window supports long conversation histories and complex multi-step agent workflows.
Key Performance Features
- Token Generation Throughput: The hybrid architecture delivers over 50% higher token generation throughput compared to leading open models, significantly reducing latency for real-world applications
- Tool Calling: Native support for building AI agents that invoke external tools across multiple conversation turns
- Multi-Token Prediction (MTP): Simultaneously predicts several future tokens in a single forward pass, accelerating long-form text generation
- Reasoning & Instruction Following: Optimized for high accuracy on complex reasoning tasks and multi-step instructions
How to Use
Developers can access Nemotron 3 Super through three interfaces:
- Workers AI Binding: Use
env.AI.run()directly in Workers code - REST API: Standard HTTP endpoints for model invocation
- OpenAI-Compatible Endpoint: Drop-in replacement for existing OpenAI integrations
For detailed documentation and usage examples, refer to the Nemotron 3 Super model page.