Xiaomi Releases MiMo-V2.5-Pro: 1.02T-Param Open-Weight Model

Xiaomi released MiMo-V2.5-Pro, a 1.02-trillion-parameter mixture-of-experts model with 42 billion active parameters, targeting hours-long autonomous coding workloads.
The model finished a Peking University CS course compiler project in 4.3 hours across 672 tool calls, scoring 233/233 on the hidden test suite.
Reported benchmarks: 78.9 on SWE-bench Verified, 57.2 on SWE-Bench Pro, 68.4 on Terminal-Bench 2.0; 73.7 on Xiaomi’s MiMo Coding Bench (vs Claude Opus 4.6 at 77.1).
Xiaomi claims 40-60% fewer tokens than Claude Opus 4.6, Gemini 3.1 Pro, and GPT-5.4 to reach comparable agent benchmarks; main version supports 1M-token context.

What Happened

Xiaomi released MiMo-V2.5-Pro, an open-weight mixture-of-experts model designed for sustained autonomous coding tasks measured in hours rather than minutes. The headline demo: the model built a complete compiler from a Peking University computer-science course in 4.3 hours and 672 tool calls, scoring 233 of 233 on the hidden test suite. Xiaomi released three additional models alongside the flagship.

Why It Matters

Hours-long autonomous coding is the current frontier where agentic AI either closes the gap with senior engineers or reveals model-level failure modes. Anthropic’s Claude Opus and OpenAI’s reasoning models have been the public reference for sustained coding agents through 2025-2026. Xiaomi’s release lands with two distinct positioning angles: open weights (most direct competitors at this scale are closed) and aggressive token efficiency (40-60% fewer tokens to reach comparable scores, per Xiaomi). For deployments where inference cost dominates total cost of ownership, the efficiency gap is the more consequential claim.

Technical Details

MiMo-V2.5-Pro packs 1.02 trillion total parameters with 42 billion activated per token. The main version supports up to 1 million tokens of context; a base version without retraining caps at 256,000 tokens. Pre-training ran on 27 trillion tokens with the context window expanded in stages. Post-training uses a teacher-student setup: specialized models optimized separately for math, security, and tool use serve as teachers to a single student model that combines their skills. A mix of local and global attention reduces memory needs for long texts by nearly 7x; parallel token prediction triples output speed.

Three demos illustrate the hours-long capability. First: the compiler project completed in 4.3 hours / 672 tool calls (137 of 233 tests passed on first compile, regressions self-diagnosed and fixed during refactoring). Second: a desktop video editor with roughly 8,000 lines of code from a few prompts, 11.5 hours of autonomous runtime and 1,870 tool calls. Third: a voltage regulator design through a circuit simulator hooked up via Claude Code, hitting all six technical specs in under an hour with four specs beating the first draft by an order of magnitude.

Coding benchmarks: 78.9 on SWE-bench Verified, 57.2 on SWE-Bench Pro, 68.4 on Terminal-Bench 2.0, 73.7 on Xiaomi’s in-house MiMo Coding Bench (vs 77.1 for Claude Opus 4.6 and 67.8 for Gemini 3.1 Pro). Agent tasks: 1,581 Elo points on GDPVal-AA, 72.9 on tau3-bench. On OpenAI’s GraphWalks long-context benchmark at 1M tokens, MiMo-V2.5-Pro scores 0.37 on breadth-first searches and 0.62 on parent-node queries, where the previous MiMo-V2-Pro dropped to zero.

Xiaomi released three companion models: MiMo-V2.5 (310B total / 15B active multimodal supporting text, image, video, audio with 1M-token context, 87.7 on Video-MME — open weights on Hugging Face); MiMo-V2.5-TTS (three-variant text-to-speech family); and additional unspecified variants in the launch.

Who’s Affected

Anthropic, OpenAI, and Google face direct pressure on the hours-long agentic-coding category from a Chinese open-weight model with a self-reported efficiency advantage. Open-source AI deployment teams gain the largest open-weight coding-focused release of 2026 to date (1.02T parameters exceeds DeepSeek-V4-Pro’s 1.6T total but with fewer active parameters per token). Inference providers — Together, Fireworks, Hugging Face — gain a new flagship to host. Xiaomi itself extends from consumer hardware into frontier-tier AI publicly.

What’s Next

Independent benchmark validation of the 4.3-hour compiler claim, the 40-60% token-efficiency advantage, and the long-context performance figures will be the cleanest external test. Quantized versions and llama.cpp / Ollama support are likely within days from the open-source community. Watch for U.S. policy reactions given the increasingly visible Chinese open-weight cohort (DeepSeek V4, Kimi K2.6, Qwen, GLM, and now MiMo-V2.5-Pro), and for whether Xiaomi extends commercial offerings beyond the open-weight release.

Xiaomi Releases MiMo-V2.5-Pro: 1.02T-Param Open-Weight Model Writes Compiler in 4.3 Hours

What Happened

Why It Matters

Technical Details

Who’s Affected

What’s Next

Enjoyed this story?

Xiaomi Releases MiMo-V2.5-Pro: 1.02T-Param Open-Weight Model Writes Compiler in 4.3 Hours

What Happened

Why It Matters

Technical Details

Who’s Affected

What’s Next

Enjoyed this story?

Citi Launches Arc Platform to Scale AI Agents Across the Bank

DeepClaude Tool Lets Claude Code Run on DeepSeek V4 Pro at 17x Lower Cost

Axios Scoop: Meta to Open-Source Versions of Its Next AI Models