NVIDIA Nemotron 3 Super now available on Cloudflare Workers AI with 50% faster token generation

NVIDIA Nemotron 3 Super Now Available on Workers AI

Cloudflare has partnered with NVIDIA to bring the Nemotron 3 Super model to Workers AI. The model is now available for immediate use through the @cf/nvidia/nemotron-3-120b-a12b identifier.

Model Architecture and Specifications

Nemotron 3 Super is a Mixture-of-Experts (MoE) model with a hybrid Mamba-transformer architecture. While the model contains 120B total parameters, only 12B parameters are active per forward pass, enabling efficient inference. The 32,000 token context window supports long conversation histories and complex multi-step agent workflows.

Key Performance Features

Token Generation Throughput: The hybrid architecture delivers over 50% higher token generation throughput compared to leading open models, significantly reducing latency for real-world applications
Tool Calling: Native support for building AI agents that invoke external tools across multiple conversation turns
Multi-Token Prediction (MTP): Simultaneously predicts several future tokens in a single forward pass, accelerating long-form text generation
Reasoning & Instruction Following: Optimized for high accuracy on complex reasoning tasks and multi-step instructions

How to Use

Developers can access Nemotron 3 Super through three interfaces:

Workers AI Binding: Use env.AI.run() directly in Workers code
REST API: Standard HTTP endpoints for model invocation
OpenAI-Compatible Endpoint: Drop-in replacement for existing OpenAI integrations

For detailed documentation and usage examples, refer to the Nemotron 3 Super model page.

NVIDIA Nemotron 3 Super Now Available on Workers AI

Model Architecture and Specifications

Key Performance Features

How to Use

Products

Tags

Published

Source

Related News