- DeepSeek released its V4 series on April 24, 2026: DeepSeek-V4-Pro (1.6T total / 49B active) and DeepSeek-V4-Flash (284B / 13B active), both with 1M-token context and MIT license.
- V4-Pro is now the largest open-weights model — larger than Kimi K2.6 (1.1T) and GLM-5.1 (754B), and more than twice DeepSeek V3.2 (685B).
- Flash is priced at $0.14/M input and $0.28/M output, undercutting GPT-5.4 Nano, Gemini 3.1 Flash-Lite, and Claude Haiku 4.5.
- DeepSeek’s paper reports V4-Pro reaches 27% of V3.2’s per-token FLOPs and 10% of its KV cache size at 1M-token context; Flash reaches 10% of FLOPs and 7% of KV cache.
What Happened
Chinese AI lab DeepSeek released the first models in its V4 series on April 24, 2026: DeepSeek-V4-Pro and DeepSeek-V4-Flash, documented by Simon Willison on April 24. Both are Mixture-of-Experts models with 1-million-token context windows, released under the MIT license. The release follows DeepSeek V3.2 from December 2025 as the lab’s first major model launch of 2026.
Why It Matters
DeepSeek-V4-Pro is now the largest open-weights model available — larger than Moonshot’s Kimi K2.6 (1.1 trillion parameters) and Z.ai’s GLM-5.1 (754 billion), and more than twice the size of DeepSeek V3.2 (685B). The pricing is more notable than the parameter count: V4-Flash undercuts GPT-5.4 Nano (the cheapest OpenAI option), Gemini 3.1 Flash-Lite, and Claude Haiku 4.5 on per-million-token rates. V4-Pro is the cheapest of the larger frontier-tier models. For any deployment where cost is the binding constraint and where open weights or self-hosting are options, the calculus changes immediately.
Technical Details
DeepSeek-V4-Pro has 1.6 trillion total parameters with 49 billion activated per token; DeepSeek-V4-Flash has 284 billion total with 13 billion activated per token. Pro weighs 865 GB on Hugging Face; Flash weighs 160 GB. Pricing on DeepSeek’s API: V4-Flash at $0.14/M input and $0.28/M output; V4-Pro at $1.74/M input and $3.48/M output. Comparator pricing (per the DeepSeek-published table): GPT-5.4 Nano at $0.20/$1.25, Gemini 3.1 Flash-Lite at $0.25/$1.50, Claude Haiku 4.5 at $1/$5, Claude Sonnet 4.6 at $3/$15, Claude Opus 4.7 at $5/$25, and GPT-5.5 at $5/$30.
The efficiency claim from DeepSeek’s accompanying paper: at 1-million-token context, V4-Pro attains 27% of V3.2’s per-token FP8 FLOPs and 10% of V3.2’s KV cache size. V4-Flash pushes further at the same context length — 10% of V3.2’s FLOPs and 7% of its KV cache. DeepSeek positions a “V4-Pro-Max” reasoning configuration as outperforming GPT-5.2 and Gemini-3.0-Pro on standard reasoning benchmarks but trailing GPT-5.4 and Gemini-3.1-Pro by what the paper describes as a 3-to-6-month gap behind state-of-the-art frontier models.
Who’s Affected
Open-source deployment teams gain the largest available open-weights model and the cheapest commercial API rate at frontier-comparable quality. Inference providers — Together, Fireworks, Anyscale, Novita — gain a major new model to host. The closest competitive impact lands on Anthropic and OpenAI: V4-Pro at $1.74/$3.48 input/output is roughly 3x cheaper than Claude Sonnet 4.6 and 1.4x cheaper than GPT-5.4 for users who can accept the 3-to-6-month frontier-trailing capability gap. Willison notes he intends to test a quantized V4-Flash on his 128 GB M5 MacBook Pro — a useful proxy for whether the model becomes practical for individual developers.
What’s Next
Quantized versions from Unsloth and similar groups are expected within days. Independent benchmarks against the 3-to-6-month frontier gap claim will determine whether V4-Pro holds up outside DeepSeek’s self-reported numbers. Watch for U.S. policy responses given the ongoing scrutiny of Chinese-origin AI models in federal procurement, and for downstream pricing pressure on the cheapest tiers from OpenAI (GPT-5.4 Nano) and Google (Gemini 3.1 Flash-Lite).