NVIDIA Unveils Groq 3 LPU at GTC 2026: A Dedicated Inference Chip That Outpaces GPUs

NVIDIA CEO Jensen Huang unveiled the Groq 3 Language Processing Unit (LPU) during his GTC 2026 keynote on March 16, marking the first dedicated inference chip born from NVIDIA's $20 billion licensing deal with startup Groq, finalized on Christmas Eve 2025.

Key Highlights

The Groq 3 LPU delivers 150 TB/s of memory bandwidth, seven times more than the 22 TB/s of HBM4 on each Vera Rubin GPU
Each chip packs 500 MB of on-chip SRAM, replacing traditional off-chip HBM with tightly integrated memory
Liquid-cooled LPX racks house 256 LPUs with 128 GB of on-chip SRAM and 640 TB/s of scale-up bandwidth
Together with Rubin GPUs, the system delivers 35 times higher throughput per megawatt

A New Architecture for Inference

Unlike GPUs that rely on high-bandwidth memory (HBM) sitting next to the processor, the Groq 3 LPU interleaves processing units directly with memory units on the chip. This design creates a streamlined, linear data flow that dramatically reduces latency — a critical factor for real-time agentic AI applications.

The architecture is purpose-built for the emerging era of multi-agent workloads, where millions of AI agents need to reason and respond in milliseconds. By moving data processing closer to memory, the Groq 3 eliminates the bottleneck that limits GPU-based inference at scale.

LPX Racks: Inference at Data Center Scale

NVIDIA will deploy the Groq 3 in dedicated LPX racks, each containing 256 LPUs connected through high-speed scale-up fabric. These liquid-cooled systems are designed to work alongside Vera Rubin GPU racks, creating a hybrid architecture where GPUs handle training and complex reasoning while LPUs accelerate inference throughput.

Together, the combined system is designed to handle trillion-parameter models and million-token context windows, unlocking what NVIDIA describes as a 10x greater revenue opportunity for cloud providers and enterprises.

The $20 Billion Bet on Inference

The Groq 3 is the direct result of NVIDIA's largest intellectual property deal to date. In December 2025, NVIDIA secured a non-exclusive license to Groq's low-latency inference technology for $20 billion — a move analysts have compared to the transformative Mellanox acquisition in 2019.

"The next big wave of AI computing is going to be around inference," Huang stated during the GTC keynote. As AI shifts from training massive models to deploying them across billions of interactions, dedicated inference hardware becomes essential for both performance and energy efficiency.

What This Means for the Industry

The Groq 3 LPU signals a fundamental shift in AI infrastructure. Until now, GPUs dominated both training and inference workloads. With a purpose-built inference chip offering 7x the memory bandwidth of its best GPU, NVIDIA is effectively creating a two-chip strategy: Rubin GPUs for training and reasoning, Groq 3 LPUs for high-throughput inference.

For cloud providers, this translates to significantly lower cost-per-token and power consumption. For developers building agentic AI systems, it means the infrastructure to support real-time multi-agent interactions at scale is arriving faster than expected.

The Groq 3 LPU is expected to ship in the third quarter of 2026.

Source: NVIDIA Developer Blog

Key Highlights

A New Architecture for Inference

LPX Racks: Inference at Data Center Scale

The $20 Billion Bet on Inference

What This Means for the Industry

Discuss Your Project with Us