NVIDIA’s New Chip Makes Current AI Hardware Look Like a Calculator

NVIDIA’s Vera Rubin platform combines six co-designed chips, including the Rubin GPU with 50 petaflops of FP4 inference performance and 22 TB/s HBM4 memory bandwidth.
The NVL72 rack configuration packs 72 Rubin GPUs, delivering 200 petaflops of FP4 performance per tray with full liquid cooling.
NVIDIA claims the platform reduces training GPU requirements by 75 percent and cuts inference cost per token by 10x compared to the prior generation.
The platform treats the entire data center as a single compute unit, integrating GPU, CPU, networking, and security into one co-designed system.

What Happened

On January 5, 2026, NVIDIA detailed the Vera Rubin platform, a next-generation AI supercomputer architecture built around six new chips designed to work as a unified system. Kyle Aubrey, NVIDIA’s Director of Technical Marketing, described the approach as “extreme co-design,” where the data center itself, rather than any single server, is the fundamental compute unit.

The platform is named after astronomer Vera Rubin, whose work on galaxy rotation curves provided evidence for dark matter. It represents NVIDIA’s most integrated hardware release to date, combining a new GPU, CPU, network switch, network interface card, data processing unit, and Ethernet switch into a single coordinated architecture. Each component is designed to complement the others, eliminating the bottlenecks that arise when chips from different vendors and design cycles are assembled together.

Why It Matters

AI model training and inference costs remain the primary bottleneck for organizations deploying large language models. NVIDIA’s claim that Vera Rubin requires one-quarter the GPUs for training compared to the previous generation, if validated in production, would substantially reduce the hardware footprint and energy consumption of large-scale AI workloads. For hyperscale data centers already struggling with power constraints, that reduction has direct implications for how many models can be trained simultaneously.

The 10x reduction in inference cost per token is equally significant. As AI applications move from research to production, inference costs often exceed training costs over the lifetime of a deployed model. A tenfold improvement would change the economics for companies running models at scale, potentially making certain AI applications commercially viable that are currently too expensive to operate.

The platform also signals a shift in competitive dynamics. By designing every chip in the stack, NVIDIA makes it harder for customers to swap individual components from competitors like AMD or Intel. Organizations that adopt the full Vera Rubin platform become deeply integrated into NVIDIA’s ecosystem.

Technical Details

The Rubin GPU delivers 50 petaflops of FP4 inference performance and 35 petaflops for training. It uses HBM4 memory with 22 TB/s of bandwidth and connects to other GPUs via NVLink 6 at 3.6 TB/s per GPU. A full NVL72 rack contains 72 Rubin GPUs and achieves 260 TB/s of aggregate NVLink bandwidth, with 200 petaflops of FP4 performance per tray. The entire system uses liquid cooling.

The Vera CPU uses 88 custom Olympus cores based on Arm v9.2 architecture, with 1.5 TB of LPDDR5X memory and 1.8 TB/s of NVLink C2C bandwidth for direct communication with the GPU. This tight CPU-GPU coupling reduces the latency that occurs when data must travel through slower interconnects between separate chips.

The ConnectX-9 network interface card provides 1.6 Tb/s per GPU with 800 Gb/s per port and programmable RDMA. The BlueField-4 DPU integrates a 64-core Grace CPU with 800 Gb/s networking for security and data processing offload. The Spectrum-6 Ethernet switch handles 102.4 Tb/s per switch using co-packaged optics and 200G PAM4 SerDes. NVIDIA reported up to 3.2x performance gains on HPC scientific computing codes and 3x improvement in variable all-to-all job completion using Spectrum-X Ethernet.

Who’s Affected

Hyperscale cloud providers, including AWS, Google Cloud, and Microsoft Azure, are the primary buyers of systems at this scale. AI research labs training frontier models will evaluate the platform against existing Hopper and Blackwell deployments. Enterprise customers running inference workloads at production scale stand to benefit from the claimed cost-per-token improvements, provided they can afford the upfront investment in new infrastructure.

Competitors AMD and Intel face pressure to match the level of system integration NVIDIA offers. While both companies produce competitive individual chips, neither currently ships a full-stack platform spanning GPU, CPU, networking, and security in a single co-designed package. Custom silicon efforts from Google (TPU) and Amazon (Trainium) represent the other competitive vector, though those chips are available only within their respective cloud platforms.

What’s Next

NVIDIA has not disclosed general availability dates or pricing for the Vera Rubin platform. A March 2026 update added a seventh chip, the Groq 3 LPX, to the platform specification. Independent benchmarks from customers will determine whether the claimed performance and efficiency gains hold outside NVIDIA’s own testing environments. Until third-party validation arrives, the performance figures remain manufacturer claims rather than verified production metrics.

NVIDIA’s New Chip Makes Current AI Hardware Look Like a Calculator — Vera Rubin Explained

What Happened

Why It Matters

Technical Details

Who’s Affected

What’s Next

Enjoyed this story?

NVIDIA’s New Chip Makes Current AI Hardware Look Like a Calculator — Vera Rubin Explained

What Happened

Why It Matters

Technical Details

Who’s Affected

What’s Next

Enjoyed this story?

Alibaba’s AI Turns Phone Photos Into 3D Restaurant Tours in Minutes — Professional Photographers Are Panicking

AI Is Writing Code Faster Than Anyone Can Audit It — This $6M Startup Says That’s a $100B Security Problem

Aurora Makes AI 1.25x Faster by Learning While It’s Running — No Retraining, No Downtime

Before you go…