NVIDIA Groq 3 LPX: A New Inference Accelerator for the AI Factory
NVIDIA has announced Groq 3 LPX, a rack-scale inference accelerator designed to complement the Vera Rubin NVL72 GPU platform. Built around 256 interconnected Groq 3 LPU accelerators, the system is optimized for low-latency inference workloads, particularly for agentic AI systems that demand both high throughput and predictable per-token latency.
Key Technical Specifications
The LPX system delivers impressive performance metrics:
- 315 PFLOPS of FP8 compute performance at rack scale
- 40 PB/s on-chip SRAM bandwidth per accelerator
- 640 TB/s rack-scale interconnect bandwidth
- 128 GB total SRAM capacity
- Heterogeneous architecture pairing LPUs with Vera Rubin NVL72 GPUs
Architecture & Heterogeneous Serving
Rather than replacing existing GPU infrastructure, LPX creates a heterogeneous inference path where the system intelligently routes workload components:
- Vera Rubin NVL72 handles prefill and decode attention, maintaining high throughput for long-context processing
- Groq 3 LPX accelerates latency-sensitive FFN and MoE expert execution during token generation
- NVIDIA Dynamo orchestrates request routing to optimize responsiveness without sacrificing overall AI factory throughput
Performance Claims & Use Cases
NVIDIA claims the combined Vera Rubin + LPX architecture delivers:
- 35x higher inference throughput per megawatt compared to alternatives
- 10x more revenue opportunity for trillion-parameter model serving
- Deterministic, low-jitter execution for stable tail latencies even at high concurrency
The system targets emerging agentic AI workloads where generation speeds approach 1,000+ tokens per second per user, enabling "speed of thought" computing for multi-agent systems and real-time AI collaboration experiences.
Availability & Deployment
LPX is integrated with NVIDIA's MGX ETL rack architecture and can be deployed alongside Vera Rubin NVL72 within existing data center infrastructure. The system is designed for production inference serving in large-scale AI factories.