Cloudflare Workers AI now supports NVIDIA Nemotron 3 Super, a 120B parameter MoE model

NVIDIA Nemotron 3 Super Now Available on Workers AI

Cloudflare has partnered with NVIDIA to integrate the Nemotron 3 Super model into Workers AI. This is a Mixture-of-Experts (MoE) model with a hybrid Mamba-transformer architecture, featuring 120B total parameters with 12B active parameters per forward pass for efficient inference.

Key Capabilities

Over 50% higher token generation throughput compared to leading open-source models, significantly reducing latency in production applications
Tool calling support for building AI agents that invoke external tools across multiple conversation turns
Multi-Token Prediction (MTP) for accelerated long-form text generation by predicting multiple future tokens in a single forward pass
32,000 token context window to maintain conversation history and plan states across complex multi-step workflows

Accessing Nemotron 3 Super

The model is available through three interfaces:

Workers AI binding (env.AI.run()) for Workers serverless functions
REST API for direct HTTP integration
OpenAI-compatible endpoint for drop-in compatibility with existing OpenAI client libraries

The model identifier is @cf/nvidia/nemotron-3-120b-a12b. Developers building multi-agent systems, reasoning-heavy applications, or complex task orchestration workflows will benefit from its optimized architecture for handling many collaborating agents per application.

NVIDIA Nemotron 3 Super Now Available on Workers AI

Key Capabilities

Accessing Nemotron 3 Super

Products

Tags

Published

Source

Related News