NVIDIA Vera Rubin POD delivers 60 exaflops for agentic AI with five specialized rack-scale systems

NVIDIA Vera Rubin POD: A Purpose-Built Supercomputer for Agentic AI

NVIDIA has introduced the Vera Rubin POD, a new AI supercomputer platform designed specifically for the emerging era of agentic AI systems. Built on the third-generation NVIDIA MGX rack architecture, the platform represents an extreme co-design effort across seven different chip types spanning compute, networking, and storage domains.

Key Specifications and Capabilities

The Vera Rubin POD delivers remarkable scale:

40 racks with 1.2 quadrillion transistors and nearly 20,000 NVIDIA dies
1,152 NVIDIA Rubin GPUs providing 60 exaflops of total compute power
10 PB/s total scale-up and scale-out bandwidth
4x better training performance and 10x better inference performance per watt compared to NVIDIA Blackwell
Purpose-built for modern AI workloads including mixture-of-experts, reinforcement learning, and large context memory requirements

Five Specialized Rack-Scale Systems

The POD integrates five distinct, purpose-built rack-scale systems:

NVL72: Core compute engine with 72 Rubin GPUs and 36 Vera CPUs connected via massive NVLink copper spine. Optimized for the four scaling laws of AI (pretraining, post-training, test-time scaling, and agentic scaling), with special support for complex MoE routing and inference workloads.
Groq 3 LPX: Dedicated to extreme low-latency inference with 256 LPUs per rack, delivering unprecedented inference performance for real-time agentic interactions.
Vera CPU: Provides dense CPU sandboxing with 256 CPUs per rack for large-scale reinforcement learning and safe code execution validation.
BlueField-4 STX: AI-native storage system featuring Coherent Memory Exchange (CMX) for efficient KV cache management and massive context memory.
Spectrum-6 SPX: Networking backbone using silicon photonics technology for low-latency, resilient connectivity across the entire POD.

Architecture and Innovation

The third-generation MGX rack features significant innovations:

Modular, cable-free design for easier deployment and serviceability
Dynamic power steering and rack-level energy storage for efficiency
45°C liquid cooling for maximum thermal efficiency
Open MGX standard with ecosystem of 80+ global partners
Support for both NVLink-connected (NVL) and Ethernet/LPU-connected (ETL) rack configurations

Agentic AI Focus

The Vera Rubin POD is specifically architected for agentic AI systems, which require:

High throughput for processing multiple concurrent agent interactions
Extreme low-latency inference for real-time decision-making
Large KV cache capacity for extended reasoning and context
CPU-based sandboxing for safe code execution and validation
Seamless coordination across multiple specialized subsystems

This represents NVIDIA's recognition that agentic AI workloads differ fundamentally from traditional model training and inference, requiring purpose-built infrastructure optimized for token generation, reasoning steps, and multi-step workflow coordination.

NVIDIA Vera Rubin POD: A Purpose-Built Supercomputer for Agentic AI

Key Specifications and Capabilities

Five Specialized Rack-Scale Systems

Architecture and Innovation

Agentic AI Focus

Tags

Published

Source