Cloudflare adds NVIDIA Nemotron 3 Super to Workers AI with 50% faster token generation

NVIDIA Nemotron 3 Super Now Available on Workers AI

Cloudflare has partnered with NVIDIA to integrate Nemotron 3 Super into Workers AI, enabling developers to run high-performance AI agents at the edge. The model combines a hybrid Mamba-transformer architecture with a Mixture-of-Experts (MoE) design, delivering 120B total parameters while requiring only 12B active parameters per forward pass.

Key Features

Faster Token Generation: The hybrid architecture delivers over 50% higher throughput compared to leading open models, significantly reducing latency for production applications
Tool Calling: Native support for building agentic systems that can invoke tools across multiple conversation turns, essential for multi-step workflows
Multi-Token Prediction (MTP): Predict multiple future tokens in a single forward pass, accelerating long-form text generation
Extended Context: 32,000 token context window for maintaining conversation history and agent state across complex workflows

Getting Started

Developers can access Nemotron 3 Super through three interfaces:

Workers AI binding: Use env.AI.run() directly in Workers code
REST API: Call /run or /v1/chat/completions endpoints
OpenAI-compatible endpoint: Integrate with existing OpenAI-based tools and frameworks

The model is optimized for multi-agent architectures, making it ideal for applications requiring coordinated AI reasoning and tool orchestration at scale.

NVIDIA Nemotron 3 Super Now Available on Workers AI

Key Features

Getting Started

Products

Tags

Published

Source

Related News