NVIDIA Groq 3 LPX

The inference accelerator for NVIDIA Vera Rubin.

Overview

Speed Meets Scale

In the past, AI inference architectures delivered either interactivity and intelligence at the cost of throughput, or throughput and intelligence at the cost of interactivity. You couldn’t have all three. Agentic systems demand more.

NVIDIA Groq 3 LPX is the inference accelerator for NVIDIA Vera Rubin, designed to meet the low-latency and large-context demands of agentic systems. Vera Rubin and LPX unite the extreme performance of NVIDIA Rubin GPUs and LPUs through a co-designed architecture.

NVIDIA Vera Rubin Opens Agentic AI Frontier

The NVIDIA Vera Rubin platform includes seven new chips in full production to scale the world’s largest AI factories.

Inside NVIDIA Groq 3 LPX: The Seventh Chip of the NVIDIA Vera Rubin Platform

NVIDIA Groq 3 LPX extends the AI factory with deterministic, low-latency token generation that complements NVIDIA Rubin GPUs for real-time inference workloads.

Inference Performance

Extreme Low Latency With Massive Throughput

By combining Rubin GPUs for high-bandwidth memory (HBM) and LPUs for static random-access memory (SRAM), NVIDIA Vera Rubin with LPX delivers a new class of inference performance for trillion-parameter models and million-token context. Deployed with Vera Rubin NVL72, Rubin GPUs and LPUs boost decode by jointly computing every layer of the AI model for every output token.

35x Higher Throughput for Trillion Parameter Models

Agentic systems consume up to 15x more tokens than traditional AI applications. AI factories must deliver on token volume and massive context windows with low latency and efficient economics. When paired with LPX, Vera Rubin delivers up to 35x higher throughput per megawatt for trillion-parameter models.

A New Category of Inference: 10x Revenue Opportunity

Agents are units of intelligence, and inference is their fuel. To deliver real-world impact, agentic systems need tokens that are fast and smart. When LPX is paired with Vera Rubin, the additional throughput per watt and token performance unlock a new tier of ultra-premium, trillion-parameter, million-context inference, expanding revenue opportunity for all AI providers.

35x Higher Throughput for Trillion Parameter Models

Agentic systems consume up to 15x more tokens than traditional AI applications. Token factories must deliver on token volume and massive context windows with low latency and efficient economics. When paired with LPX, Vera Rubin delivers up to 35x higher throughput per megawatt for trillion-parameter models.

A New Category of Inference: 10x Revenue Opportunity

Agents are units of intelligence, and inference is their fuel. To deliver real-world value, agentic systems need high-value tokens that are faster and pack more context. When LPX is paired with Vera Rubin, AI factories can produce premium tokens at scale, unlocking 10x more revenue per watt.

NVIDIA Groq 3 LPU Inference Accelerator

The NVIDIA Groq 3 LPU is the next generation of Groq’s innovative language processing unit. Each LPX rack features 256 interconnected LPU accelerators that, together with the NVIDIA Vera Rubin platform, supercharge inference. Each LPU accelerator delivers 500 megabytes (MB) of SRAM, 150 terabytes per second (TB/s) of SRAM bandwidth, and 2.5 TB/s scale-up bandwidth.

NVIDIA Groq 3 LPU Inference Accelerator

The NVIDIA Groq 3 LPU is the next generation of Groq’s innovative language processing unit. Each LPX rack features 256 interconnected LPU accelerators that, together with the NVIDIA Vera Rubin platform, supercharge inference. Each LPU accelerator delivers 500 megabytes (MB) of SRAM, 150 terabytes per second (TB/s) of SRAM bandwidth, and 2.5 TB/s scale-up bandwidth.

Technology Breakthroughs

Extreme Co-Design. Extraordinary Results.

Built through extreme co-design, the NVIDIA Vera Rubin NVL72 unifies seven purpose-built chips into a single AI supercomputer.

Rack Scale

In one LPX rack, 256 LPU chips come together to deliver extreme performance.

Fusion Memory Architecture

In each rack, LPX delivers 128 GB of SRAM for low-latency processing and 12 TB of DDR5 memory for large models and workloads.

High-Velocity SRAM

40 petabytes per second (PB/s) of SRAM bandwidth per rack delivers low latency.

Massive Scale-Up Bandwidth

Direct chip-to-chip links deliver 640 TB/s of scale-up bandwidth across the LPX rack for low-latency chip communication.

High-Speed Connection With NVIDIA NVL72

LPX’s high-speed connections to NVL72 reduce latency to near zero.

NVIDIA MGX ETL Rack

LPX leverages the NVIDIA MGX™ extract, transform, and load (ETL) rack, enabling token factories to plan for a single universal rack in their NVIDIA Vera Rubin platform deployments.

Get Started

Stay Up to Date on NVIDIA News

Sign up for the latest news, updates, and more from NVIDIA.