NVIDIA Vera Rubin POD: A Purpose-Built Supercomputer for Agentic AI
NVIDIA has introduced the Vera Rubin POD, a new AI supercomputer platform designed specifically for the emerging era of agentic AI systems. Built on the third-generation NVIDIA MGX rack architecture, the platform represents an extreme co-design effort across seven different chip types spanning compute, networking, and storage domains.
Key Specifications and Capabilities
The Vera Rubin POD delivers remarkable scale:
- 40 racks with 1.2 quadrillion transistors and nearly 20,000 NVIDIA dies
- 1,152 NVIDIA Rubin GPUs providing 60 exaflops of total compute power
- 10 PB/s total scale-up and scale-out bandwidth
- 4x better training performance and 10x better inference performance per watt compared to NVIDIA Blackwell
- Purpose-built for modern AI workloads including mixture-of-experts, reinforcement learning, and large context memory requirements
Five Specialized Rack-Scale Systems
The POD integrates five distinct, purpose-built rack-scale systems:
NVL72: Core compute engine with 72 Rubin GPUs and 36 Vera CPUs connected via massive NVLink copper spine. Optimized for the four scaling laws of AI (pretraining, post-training, test-time scaling, and agentic scaling), with special support for complex MoE routing and inference workloads.
Groq 3 LPX: Dedicated to extreme low-latency inference with 256 LPUs per rack, delivering unprecedented inference performance for real-time agentic interactions.
Vera CPU: Provides dense CPU sandboxing with 256 CPUs per rack for large-scale reinforcement learning and safe code execution validation.
BlueField-4 STX: AI-native storage system featuring Coherent Memory Exchange (CMX) for efficient KV cache management and massive context memory.
Spectrum-6 SPX: Networking backbone using silicon photonics technology for low-latency, resilient connectivity across the entire POD.
Architecture and Innovation
The third-generation MGX rack features significant innovations:
- Modular, cable-free design for easier deployment and serviceability
- Dynamic power steering and rack-level energy storage for efficiency
- 45°C liquid cooling for maximum thermal efficiency
- Open MGX standard with ecosystem of 80+ global partners
- Support for both NVLink-connected (NVL) and Ethernet/LPU-connected (ETL) rack configurations
Agentic AI Focus
The Vera Rubin POD is specifically architected for agentic AI systems, which require:
- High throughput for processing multiple concurrent agent interactions
- Extreme low-latency inference for real-time decision-making
- Large KV cache capacity for extended reasoning and context
- CPU-based sandboxing for safe code execution and validation
- Seamless coordination across multiple specialized subsystems
This represents NVIDIA's recognition that agentic AI workloads differ fundamentally from traditional model training and inference, requiring purpose-built infrastructure optimized for token generation, reasoning steps, and multi-step workflow coordination.