← Back
NVIDIA
NVIDIA unveils Vera Rubin POD with 1,152 Rubin GPUs, 60 exaflops for agentic AI workloads
· releaseplatformperformance · developer.nvidia.com ↗

NVIDIA Vera Rubin POD: Enterprise-Grade AI Infrastructure

NVIDIA announced the Vera Rubin POD, a comprehensive AI supercomputer platform purpose-built for the emerging era of agentic AI systems. The POD combines five specialized rack-scale systems on the third-generation NVIDIA MGX architecture, each optimized for distinct workload patterns including training, inference, sandboxing, and memory-intensive operations.

Key System Specifications

The Vera Rubin POD spans 40 racks with 1.2 quadrillion transistors and nearly 20,000 NVIDIA dies. The platform delivers:

  • 1,152 NVIDIA Rubin GPUs with 60 exaflops of compute performance
  • 10 PB/s total scale-up bandwidth for high-throughput inter-GPU communication
  • Seven specialized chip types spanning compute, networking, and storage functions
  • Up to 4x better training performance and 10x better inference performance per watt vs. Blackwell

Five Specialized Rack-Scale Systems

  1. NVL72: Core compute engine with 72 Rubin GPUs and 36 Vera CPUs connected via NVLink; supports mixture-of-experts routing, test-time scaling, and agentic scaling patterns
  2. Groq 3 LPX: Dedicated low-latency inference with 256 LPUs per rack
  3. Vera CPU: Dense CPU infrastructure with 256 CPUs per rack for reinforcement learning and sandboxed environments
  4. BlueField-4 STX: AI-native storage with CMX technology for KV cache optimization
  5. Spectrum-6 SPX: Silicon photonics-based networking for low-latency, resilient inter-rack connectivity

Architecture and Deployment

The MGX architecture features modular, cable-free design with dynamic power steering, rack-level energy storage, and intelligent power smoothing. All five rack types share identical power, cooling, and mechanical envelopes for seamless integration. The open MGX standard benefits from an ecosystem of 80+ partners with established supply chains for rapid deployment.

The platform directly addresses modern agentic AI requirements, where autonomous systems generate massive token volumes through inter-agent reasoning, tool invocation, and continuous multi-step workflows that demand both high throughput and extreme low-latency across compute, storage, and networking layers.