NVIDIA launches Vera Rubin POD, an AI supercomputer combining seven chips and 1,152 GPUs for agentic AI workloads

NVIDIA Vera Rubin POD: A Purpose-Built AI Supercomputer

NVIDIA has announced the Vera Rubin POD, a comprehensive AI supercomputer platform designed specifically for the era of agentic AI systems. Built through extreme co-design of seven chips spanning compute, networking, and storage, the platform features 40 racks, 1,152 NVIDIA Rubin GPUs, 1.2 quadrillion transistors, nearly 20,000 NVIDIA dies, 60 exaflops of peak performance, and 10 PB/s of total scale-up bandwidth.

Five Specialized Rack-Scale Systems

The Vera Rubin POD integrates five purpose-built rack-scale systems optimized for different aspects of agentic AI workloads:

NVL72: The core compute engine with 72 NVIDIA Rubin GPUs and 36 NVIDIA Vera CPUs, optimized for pretraining, post-training, test-time scaling, and agentic scaling. Delivers 4x better training performance and 10x better inference performance per watt compared to NVIDIA Blackwell.
Groq 3 LPX: Provides 256 LPUs per rack for low-latency inference, critical for agent decision-making in real-time systems.
Vera CPU: Offers 256 CPUs per rack for large-scale reinforcement learning and sandboxed environments to validate AI-generated results.
BlueField-4 STX: AI-native storage system with CMX technology for KV cache management, addressing the memory demands of large-context agentic systems.
Spectrum-6 SPX: Silicon photonics-based networking fabric ensuring low-latency, resilient connectivity across the entire system.

Built on Third-Generation MGX Architecture

All systems share the third-generation NVIDIA MGX rack architecture, featuring innovations including modular cable-free design, dynamic power steering, rack-level energy storage, intelligent power smoothing, and 45°C liquid cooling. The platform is supported by an ecosystem of more than 80 partners with global supply chain expertise.

Key Capabilities for Agentic AI

The Vera Rubin POD is architected to support modern agentic systems that plan tasks, invoke tools, execute code, retrieve data, and coordinate complex multistep workflows. The platform addresses emerging demands including extreme low-latency inference for agent decision-making, massive KV cache requirements from extended context windows, and CPU-based sandboxing for validating AI-generated results. Token consumption is expected to exceed 10 quadrillion tokens per year, shifting from human-AI interactions to AI-AI interactions.

NVIDIA Vera Rubin POD: A Purpose-Built AI Supercomputer

Five Specialized Rack-Scale Systems

Built on Third-Generation MGX Architecture

Key Capabilities for Agentic AI

Tags

Published

Source