← Back
Cloudflare
Cloudflare adds NVIDIA Nemotron 3 Super to Workers AI with 50% faster token generation
Cloudflare Workers · modelfeatureintegrationreleaseplatform · developers.cloudflare.com ↗

NVIDIA Nemotron 3 Super Now Available on Workers AI

Cloudflare has partnered with NVIDIA to integrate Nemotron 3 Super into Workers AI, enabling developers to run high-performance AI agents at the edge. The model combines a hybrid Mamba-transformer architecture with a Mixture-of-Experts (MoE) design, delivering 120B total parameters while requiring only 12B active parameters per forward pass.

Key Features

  • Faster Token Generation: The hybrid architecture delivers over 50% higher throughput compared to leading open models, significantly reducing latency for production applications
  • Tool Calling: Native support for building agentic systems that can invoke tools across multiple conversation turns, essential for multi-step workflows
  • Multi-Token Prediction (MTP): Predict multiple future tokens in a single forward pass, accelerating long-form text generation
  • Extended Context: 32,000 token context window for maintaining conversation history and agent state across complex workflows

Getting Started

Developers can access Nemotron 3 Super through three interfaces:

  • Workers AI binding: Use env.AI.run() directly in Workers code
  • REST API: Call /run or /v1/chat/completions endpoints
  • OpenAI-compatible endpoint: Integrate with existing OpenAI-based tools and frameworks

The model is optimized for multi-agent architectures, making it ideal for applications requiring coordinated AI reasoning and tool orchestration at scale.