← Back
NVIDIA
NVIDIA Vera Rubin POD delivers 60 exaflops for agentic AI with five specialized rack-scale systems
· releaseplatformperformance · developer.nvidia.com ↗

NVIDIA Vera Rubin POD: A Purpose-Built Supercomputer for Agentic AI

NVIDIA has introduced the Vera Rubin POD, a new AI supercomputer platform designed specifically for the emerging era of agentic AI systems. Built on the third-generation NVIDIA MGX rack architecture, the platform represents an extreme co-design effort across seven different chip types spanning compute, networking, and storage domains.

Key Specifications and Capabilities

The Vera Rubin POD delivers remarkable scale:

  • 40 racks with 1.2 quadrillion transistors and nearly 20,000 NVIDIA dies
  • 1,152 NVIDIA Rubin GPUs providing 60 exaflops of total compute power
  • 10 PB/s total scale-up and scale-out bandwidth
  • 4x better training performance and 10x better inference performance per watt compared to NVIDIA Blackwell
  • Purpose-built for modern AI workloads including mixture-of-experts, reinforcement learning, and large context memory requirements

Five Specialized Rack-Scale Systems

The POD integrates five distinct, purpose-built rack-scale systems:

  1. NVL72: Core compute engine with 72 Rubin GPUs and 36 Vera CPUs connected via massive NVLink copper spine. Optimized for the four scaling laws of AI (pretraining, post-training, test-time scaling, and agentic scaling), with special support for complex MoE routing and inference workloads.

  2. Groq 3 LPX: Dedicated to extreme low-latency inference with 256 LPUs per rack, delivering unprecedented inference performance for real-time agentic interactions.

  3. Vera CPU: Provides dense CPU sandboxing with 256 CPUs per rack for large-scale reinforcement learning and safe code execution validation.

  4. BlueField-4 STX: AI-native storage system featuring Coherent Memory Exchange (CMX) for efficient KV cache management and massive context memory.

  5. Spectrum-6 SPX: Networking backbone using silicon photonics technology for low-latency, resilient connectivity across the entire POD.

Architecture and Innovation

The third-generation MGX rack features significant innovations:

  • Modular, cable-free design for easier deployment and serviceability
  • Dynamic power steering and rack-level energy storage for efficiency
  • 45°C liquid cooling for maximum thermal efficiency
  • Open MGX standard with ecosystem of 80+ global partners
  • Support for both NVLink-connected (NVL) and Ethernet/LPU-connected (ETL) rack configurations

Agentic AI Focus

The Vera Rubin POD is specifically architected for agentic AI systems, which require:

  • High throughput for processing multiple concurrent agent interactions
  • Extreme low-latency inference for real-time decision-making
  • Large KV cache capacity for extended reasoning and context
  • CPU-based sandboxing for safe code execution and validation
  • Seamless coordination across multiple specialized subsystems

This represents NVIDIA's recognition that agentic AI workloads differ fundamentally from traditional model training and inference, requiring purpose-built infrastructure optimized for token generation, reasoning steps, and multi-step workflow coordination.