FUNDING

Why CUDA, Not Hardware, Is Nvidia’s Real Competitive Moat

S Sarah Chen May 12, 2026 3 min read
Engine Score 8/10 — Important

tier-1 funding

Editorial illustration for: Why CUDA, Not Hardware, Is Nvidia's Real Competitive Moat
  • Wired argues that Nvidia‘s competitive moat is software — specifically CUDA — rather than its GPU silicon.
  • CUDA originated from work by Stanford PhD student Ian Buck and Nvidia engineer John Nickolls in the early 2000s, repurposing graphics processors for general-purpose parallel computing.
  • DeepSeek engineers worked below CUDA in PTX, Nvidia’s GPU assembly language, to squeeze performance from constrained hardware.
  • CUDA’s value, the essay contends, lies in nested libraries that shave nanoseconds off individual matrix operations — a compounding edge at training-run scale.

What Happened

In a Wired essay published on Monday, the publication’s AI column argued that CUDA, not Nvidia’s chips, constitutes the chip-maker’s real competitive moat. The piece frames Nvidia’s CEO Jensen Huang as having repeatedly called CUDA his “most precious treasure,” and traces the platform’s origin to early-2000s work by Stanford PhD student Ian Buck and Nvidia engineer John Nickolls. Buck, who first encountered GPUs as a gamer, developed a programming language called Brook before joining Nvidia and co-leading what became CUDA.

Why It Matters

The argument matters because it reframes the public narrative around Nvidia’s defensibility. Investor commentary has tended to fixate on chip generations — H100, B100, Rubin — as the source of Nvidia’s pricing power. The Wired thesis instead points to a layered software platform with tightly tuned libraries that competitors must reproduce to make alternative hardware viable. The DeepSeek precedent is illustrative: when the Chinese startup released competitive models, its engineers reportedly bypassed CUDA’s higher abstractions and wrote directly in PTX, Nvidia’s GPU assembly language, to squeeze performance from constrained Nvidia hardware. Even that workaround stayed inside Nvidia’s runtime.

Technical Details

CUDA stands for Compute Unified Device Architecture, though the acronym is rarely expanded in practice. Its core abstraction is parallelisation across many GPU cores. The essay uses a 9×9 multiplication table as an illustration: a single-core CPU performs all 81 operations sequentially; a 9-core GPU can perform them in 9 columns of 9 operations; a GPU programmed to exploit commutativity (7×9 = 9×7) reduces the work to 45 operations. At AI-training scale, where single runs can cost in excess of $100 million, such micro-optimisations compound. CUDA itself is layered: above the hardware sits PTX (assembly-level), above that sits CUDA’s nested libraries optimised for specific matrix and tensor operations, and above that sit higher-level frameworks like PyTorch and JAX that most ML engineers actually write against.

Who’s Affected

Competing chip makers — AMD with ROCm, Intel with oneAPI, Google with TPU/XLA, AWS with Neuron, plus startups Cerebras, Groq, Tenstorrent and SambaNova — face the challenge of replicating not just silicon but a thousand person-years of accumulated runtime software. AI labs face a portability question: while frameworks like PyTorch abstract some of CUDA away, training-scale optimisations often touch CUDA-specific code paths. Open-source AI projects have produced alternatives (Triton, Mojo) that compile to multiple backends, but Nvidia retains the home-court advantage for tooling, drivers, and library maturity. Buffett’s “moat” terminology, the essay notes, has been repeatedly invoked by AI lab leaders — most famously in a leaked internal Google memo titled “We Have No Moat, And Neither Does OpenAI” — to describe what the frontier labs lack.

What’s Next

The dependency cuts both ways. Frontier labs continue to publish portable kernel libraries and porting layers to reduce CUDA lock-in. Nvidia, conversely, continues to deepen CUDA — each generation of Hopper, Blackwell, and Rubin GPUs ships with library updates that exploit new tensor-core capabilities. The essay’s broader argument — that Nvidia is best understood as a software company that also sells chips — is increasingly reflected in the company’s hiring patterns and product cadence. Nvidia did not respond to Wired prior to publication.

Related Reading

Share

Enjoyed this story?

Get articles like this delivered daily. The Engine Room — free AI intelligence newsletter.

Join 500+ AI professionals · No spam · Unsubscribe anytime