- Chinese AI company MiniMax released M3, billed as the first open-weight model to combine top-tier coding, a one-million-token context window, and native multimodality.
- A new "MiniMax Sparse Attention" architecture processes only relevant data blocks, cutting compute to about one-twentieth and speeding input processing more than nine times.
- On SWE-Bench Pro, M3 scores 59% — ahead of GPT-5.5 and Gemini 3.1 Pro, just behind Anthropic’s Opus 4.7 — and reaches proprietary-class results on terminal tasks and tool use.
- M3 is available via API now, with open weights to be published shortly.
What Happened
Chinese AI company MiniMax has released M3, a new open-weight model that, according to The Decoder, is the first open model to combine top-tier coding performance, a one-million-token context window, and native multimodality in a single system. MiniMax says that combination was previously reserved for proprietary systems like Anthropic’s Opus 4.7, OpenAI’s GPT-5.5, or Google’s Gemini 3.1 Pro. The model is available via API now, with weights to follow shortly.
Why It Matters
The gap between open and closed models has been the central question of the past two years, and M3 is the latest evidence that it is narrowing fast — especially from Chinese labs. An open-weight model that matches proprietary leaders on coding and long-context tasks changes the calculus for developers and enterprises weighing whether to pay for closed APIs or self-host. It also intensifies the US–China open-model competition that Nvidia’s Nemotron 3 Ultra release underscored the same week, where Chinese open models have retained the top spots.
The efficiency story matters as much as the capability story. If MiniMax’s sparse-attention approach genuinely cuts compute to a fraction of dense attention while extending context to a million tokens, it lowers the cost barrier that has kept long-context, high-capability models expensive to run — a barrier that has favored well-capitalized proprietary labs.
Technical Details
The headline architectural innovation is "MiniMax Sparse Attention," which processes only the relevant blocks of data rather than attending across the entire context uniformly. MiniMax reports this cuts compute to roughly one-twentieth and speeds input processing by more than nine times — the mechanism that makes a one-million-token window economically viable. In internal tests, M3 planned, debugged, and self-corrected autonomously over many hours, a sign of the long-horizon agentic behavior that has become the frontier benchmark.
On performance, M3 scores 59% on SWE-Bench Pro, an established software-engineering benchmark — ahead of GPT-5.5 and Gemini 3.1 Pro, and just behind Opus 4.7. MiniMax also reports proprietary-class results on terminal tasks, tool use, and autonomous web search. As always, these are vendor-reported figures pending independent verification once the weights are public.
Who’s Affected
Developers and enterprises gain a credible open alternative for coding and long-context workloads that can be self-hosted, reducing dependence on metered proprietary APIs. Proprietary labs (OpenAI, Anthropic, Google) face continued pressure on the open/closed price-performance gap. The broader open-source AI ecosystem — Hugging Face, inference providers, downstream fine-tuners — gains a powerful new base model worth following closely. And the US–China model race gains another data point, with Chinese labs continuing to lead open-weight capability.
What’s Next
The decisive moment will be the open-weights release, when independent researchers can verify MiniMax’s benchmark claims and test the sparse-attention architecture at scale. Watch whether inference providers adopt M3 quickly and how its real-world long-context performance holds up beyond benchmarks. If the efficiency claims survive scrutiny, sparse-attention approaches like MiniMax’s could influence how the next generation of both open and closed long-context models is built.