← Back
NVIDIA
NVIDIA announces Vera Rubin POD, five-system AI supercomputer delivering 60 exaflops
· releaseplatformperformance · developer.nvidia.com ↗

Overview

NVIDIA has announced the Vera Rubin POD, a purpose-built AI supercomputer platform designed for the emerging era of agentic AI systems. The platform represents extreme co-design across seven different chip types spanning compute, networking, and storage, integrated within the third-generation NVIDIA MGX rack architecture.

Architecture and Scale

The Vera Rubin POD consists of:

  • 40 racks forming a unified supercomputer
  • 1,152 NVIDIA Rubin GPUs with 60 exaflops of peak performance
  • 1.2 quadrillion transistors across nearly 20,000 NVIDIA dies
  • 10 PB/s total scale-up bandwidth for interconnectivity

The platform is built on five distinct rack-scale systems, each optimized for different aspects of agentic AI workloads.

Key Components

NVIDIA Vera Rubin NVL72 serves as the core compute engine, integrating 72 Rubin GPUs and 36 Vera CPUs via NVLink copper spine. It delivers up to 4x better training performance and 10x better inference performance per watt compared to Blackwell, optimized for mixture-of-experts routing and compute-bound context phases of inference.

Additional specialized systems include:

  • Groq 3 LPX for extreme low-latency inference (256 LPUs per rack)
  • Vera CPU system for CPU-based sandboxing and reinforcement learning (256 CPUs per rack)
  • BlueField-4 STX for AI-native storage with KV cache management
  • Spectrum-6 SPX for silicon photonics-based networking

Design Innovation

The MGX rack architecture features modular cable-free design, dynamic power steering, rack-level energy storage, and 45C liquid cooling to maximize reliability and energy efficiency. Both NVLink-based (NVL) and Ethernet/LPU-based (ETL) rack variants share identical mechanical and power envelopes for deployment flexibility.

Market Context

The Vera Rubin POD addresses the shift toward agentic AI systems where multiple AI agents collaborate through reasoning tokens, tool invocation, and continuous workflows. This generates exponentially higher token volumes than traditional LLM inference, requiring specialized infrastructure for high throughput, low latency, and CPU sandboxing capabilities.