LAUNCHES

DeepSeek V4 Launches with 1.6T-Param Pro and 284B Flash, Pricing Below All Frontier Rivals

R Ryan Matsuda May 2, 2026 3 min read
Engine Score 8/10 — Important

DeepSeek V4 — near-frontier at fraction of price

  • DeepSeek released its V4 series on April 24, 2026: DeepSeek-V4-Pro (1.6T total / 49B active) and DeepSeek-V4-Flash (284B / 13B active), both with 1M-token context and MIT license.
  • V4-Pro is now the largest open-weights model — larger than Kimi K2.6 (1.1T) and GLM-5.1 (754B), and more than twice DeepSeek V3.2 (685B).
  • Flash is priced at $0.14/M input and $0.28/M output, undercutting GPT-5.4 Nano, Gemini 3.1 Flash-Lite, and Claude Haiku 4.5.
  • DeepSeek’s paper reports V4-Pro reaches 27% of V3.2’s per-token FLOPs and 10% of its KV cache size at 1M-token context; Flash reaches 10% of FLOPs and 7% of KV cache.

What Happened

Chinese AI lab DeepSeek released the first models in its V4 series on April 24, 2026: DeepSeek-V4-Pro and DeepSeek-V4-Flash, documented by Simon Willison on April 24. Both are Mixture-of-Experts models with 1-million-token context windows, released under the MIT license. The release follows DeepSeek V3.2 from December 2025 as the lab’s first major model launch of 2026.

Why It Matters

DeepSeek-V4-Pro is now the largest open-weights model available — larger than Moonshot’s Kimi K2.6 (1.1 trillion parameters) and Z.ai’s GLM-5.1 (754 billion), and more than twice the size of DeepSeek V3.2 (685B). The pricing is more notable than the parameter count: V4-Flash undercuts GPT-5.4 Nano (the cheapest OpenAI option), Gemini 3.1 Flash-Lite, and Claude Haiku 4.5 on per-million-token rates. V4-Pro is the cheapest of the larger frontier-tier models. For any deployment where cost is the binding constraint and where open weights or self-hosting are options, the calculus changes immediately.

Technical Details

DeepSeek-V4-Pro has 1.6 trillion total parameters with 49 billion activated per token; DeepSeek-V4-Flash has 284 billion total with 13 billion activated per token. Pro weighs 865 GB on Hugging Face; Flash weighs 160 GB. Pricing on DeepSeek’s API: V4-Flash at $0.14/M input and $0.28/M output; V4-Pro at $1.74/M input and $3.48/M output. Comparator pricing (per the DeepSeek-published table): GPT-5.4 Nano at $0.20/$1.25, Gemini 3.1 Flash-Lite at $0.25/$1.50, Claude Haiku 4.5 at $1/$5, Claude Sonnet 4.6 at $3/$15, Claude Opus 4.7 at $5/$25, and GPT-5.5 at $5/$30.

The efficiency claim from DeepSeek’s accompanying paper: at 1-million-token context, V4-Pro attains 27% of V3.2’s per-token FP8 FLOPs and 10% of V3.2’s KV cache size. V4-Flash pushes further at the same context length — 10% of V3.2’s FLOPs and 7% of its KV cache. DeepSeek positions a “V4-Pro-Max” reasoning configuration as outperforming GPT-5.2 and Gemini-3.0-Pro on standard reasoning benchmarks but trailing GPT-5.4 and Gemini-3.1-Pro by what the paper describes as a 3-to-6-month gap behind state-of-the-art frontier models.

Who’s Affected

Open-source deployment teams gain the largest available open-weights model and the cheapest commercial API rate at frontier-comparable quality. Inference providers — Together, Fireworks, Anyscale, Novita — gain a major new model to host. The closest competitive impact lands on Anthropic and OpenAI: V4-Pro at $1.74/$3.48 input/output is roughly 3x cheaper than Claude Sonnet 4.6 and 1.4x cheaper than GPT-5.4 for users who can accept the 3-to-6-month frontier-trailing capability gap. Willison notes he intends to test a quantized V4-Flash on his 128 GB M5 MacBook Pro — a useful proxy for whether the model becomes practical for individual developers.

What’s Next

Quantized versions from Unsloth and similar groups are expected within days. Independent benchmarks against the 3-to-6-month frontier gap claim will determine whether V4-Pro holds up outside DeepSeek’s self-reported numbers. Watch for U.S. policy responses given the ongoing scrutiny of Chinese-origin AI models in federal procurement, and for downstream pricing pressure on the cheapest tiers from OpenAI (GPT-5.4 Nano) and Google (Gemini 3.1 Flash-Lite).

Share

Enjoyed this story?

Get articles like this delivered daily. The Engine Room — free AI intelligence newsletter.

Join 500+ AI professionals · No spam · Unsubscribe anytime