NVIDIA Vera Rubin POD: The Agentic AI Supercomputer
NVIDIA has introduced the Vera Rubin POD, a purpose-built, rack-scale AI supercomputer designed to address the computational demands of agentic AI systems. This platform represents a major architectural shift from traditional large language model training infrastructure toward systems optimized for multi-agent reasoning, tool invocation, and continuous workflows.
Key Specifications and Architecture
The Vera Rubin POD is a massive integrated system featuring:
- 40 racks with 1.2 quadrillion transistors and nearly 20,000 NVIDIA dies
- 1,152 NVIDIA Rubin GPUs delivering 60 exaflops of total compute
- 10 PB/s aggregate bandwidth for interconnect and storage
- Seven specialized chips across compute, networking, and storage domains
- Built on the third-generation NVIDIA MGX rack architecture
Five Specialized Rack-Scale Systems
The POD integrates five purpose-built systems working as a cohesive unit:
NVL72: Core compute engine with 72 Rubin GPUs per rack, optimized for the "four scaling laws" (pretraining, post-training, test-time scaling, agentic scaling). Delivers 4x better training performance and 10x better inference performance per watt versus Blackwell, plus 10x better inference performance/watt for mixture-of-experts workloads.
Groq 3 LPX: Low-latency inference engine with 256 LPUs per rack, designed for real-time agent decision-making and tool invocation.
Vera CPU: Dense CPU sandbox with 256 CPUs per rack for large-scale reinforcement learning, code execution environments, and agent validation.
BlueField-4 STX: AI-native storage system with CMX technology for KV cache management and massive context memory support.
Spectrum-6 SPX: Silicon photonics-based networking for low-latency, resilient connectivity between rack systems.
Purpose-Built for Agentic AI
This platform addresses the emerging paradigm where AI agents interact with each other in continuous loops, requiring:
- Massive KV cache management for long-context reasoning
- Low-latency inference for real-time decision-making
- Dense CPU resources for sandboxed execution and validation
- Mixture-of-experts routing for specialized sub-models
- Reinforcement learning infrastructure for agent refinement
Deployment and Ecosystem
The Vera Rubin POD leverages NVIDIA's open MGX standard with over 80 global partners in the supply chain. All rack systems share compatible power, cooling, and mechanical envelopes, enabling rapid deployment and integration. The platform targets energy-efficient data center operations for the next generation of AI workloads dominated by agent-to-agent interactions rather than human-to-AI interactions.