LAUNCHES

Xiaomi Releases MiMo-V2.5-Pro: 1.02T-Param Open-Weight Model Writes Compiler in 4.3 Hours

R Ryan Matsuda May 3, 2026 3 min read
Engine Score 8/10 — Important

Xiaomi MiMo-V2.5-Pro open-weight model targets Claude Opus on autonomous coding

Editorial illustration for: Xiaomi Releases MiMo-V2.5-Pro: 1.02T-Param Open-Weight Model Writes Compiler in 4.3 Hours
  • Xiaomi released MiMo-V2.5-Pro, a 1.02-trillion-parameter mixture-of-experts model with 42 billion active parameters, targeting hours-long autonomous coding workloads.
  • The model finished a Peking University CS course compiler project in 4.3 hours across 672 tool calls, scoring 233/233 on the hidden test suite.
  • Reported benchmarks: 78.9 on SWE-bench Verified, 57.2 on SWE-Bench Pro, 68.4 on Terminal-Bench 2.0; 73.7 on Xiaomi’s MiMo Coding Bench (vs Claude Opus 4.6 at 77.1).
  • Xiaomi claims 40-60% fewer tokens than Claude Opus 4.6, Gemini 3.1 Pro, and GPT-5.4 to reach comparable agent benchmarks; main version supports 1M-token context.

What Happened

Xiaomi released MiMo-V2.5-Pro, an open-weight mixture-of-experts model designed for sustained autonomous coding tasks measured in hours rather than minutes. The headline demo: the model built a complete compiler from a Peking University computer-science course in 4.3 hours and 672 tool calls, scoring 233 of 233 on the hidden test suite. Xiaomi released three additional models alongside the flagship.

Why It Matters

Hours-long autonomous coding is the current frontier where agentic AI either closes the gap with senior engineers or reveals model-level failure modes. Anthropic’s Claude Opus and OpenAI’s reasoning models have been the public reference for sustained coding agents through 2025-2026. Xiaomi’s release lands with two distinct positioning angles: open weights (most direct competitors at this scale are closed) and aggressive token efficiency (40-60% fewer tokens to reach comparable scores, per Xiaomi). For deployments where inference cost dominates total cost of ownership, the efficiency gap is the more consequential claim.

Technical Details

MiMo-V2.5-Pro packs 1.02 trillion total parameters with 42 billion activated per token. The main version supports up to 1 million tokens of context; a base version without retraining caps at 256,000 tokens. Pre-training ran on 27 trillion tokens with the context window expanded in stages. Post-training uses a teacher-student setup: specialized models optimized separately for math, security, and tool use serve as teachers to a single student model that combines their skills. A mix of local and global attention reduces memory needs for long texts by nearly 7x; parallel token prediction triples output speed.

Three demos illustrate the hours-long capability. First: the compiler project completed in 4.3 hours / 672 tool calls (137 of 233 tests passed on first compile, regressions self-diagnosed and fixed during refactoring). Second: a desktop video editor with roughly 8,000 lines of code from a few prompts, 11.5 hours of autonomous runtime and 1,870 tool calls. Third: a voltage regulator design through a circuit simulator hooked up via Claude Code, hitting all six technical specs in under an hour with four specs beating the first draft by an order of magnitude.

Coding benchmarks: 78.9 on SWE-bench Verified, 57.2 on SWE-Bench Pro, 68.4 on Terminal-Bench 2.0, 73.7 on Xiaomi’s in-house MiMo Coding Bench (vs 77.1 for Claude Opus 4.6 and 67.8 for Gemini 3.1 Pro). Agent tasks: 1,581 Elo points on GDPVal-AA, 72.9 on tau3-bench. On OpenAI’s GraphWalks long-context benchmark at 1M tokens, MiMo-V2.5-Pro scores 0.37 on breadth-first searches and 0.62 on parent-node queries, where the previous MiMo-V2-Pro dropped to zero.

Xiaomi released three companion models: MiMo-V2.5 (310B total / 15B active multimodal supporting text, image, video, audio with 1M-token context, 87.7 on Video-MME — open weights on Hugging Face); MiMo-V2.5-TTS (three-variant text-to-speech family); and additional unspecified variants in the launch.

Who’s Affected

Anthropic, OpenAI, and Google face direct pressure on the hours-long agentic-coding category from a Chinese open-weight model with a self-reported efficiency advantage. Open-source AI deployment teams gain the largest open-weight coding-focused release of 2026 to date (1.02T parameters exceeds DeepSeek-V4-Pro’s 1.6T total but with fewer active parameters per token). Inference providers — Together, Fireworks, Hugging Face — gain a new flagship to host. Xiaomi itself extends from consumer hardware into frontier-tier AI publicly.

What’s Next

Independent benchmark validation of the 4.3-hour compiler claim, the 40-60% token-efficiency advantage, and the long-context performance figures will be the cleanest external test. Quantized versions and llama.cpp / Ollama support are likely within days from the open-source community. Watch for U.S. policy reactions given the increasingly visible Chinese open-weight cohort (DeepSeek V4, Kimi K2.6, Qwen, GLM, and now MiMo-V2.5-Pro), and for whether Xiaomi extends commercial offerings beyond the open-weight release.

Share

Enjoyed this story?

Get articles like this delivered daily. The Engine Room — free AI intelligence newsletter.

Join 500+ AI professionals · No spam · Unsubscribe anytime