ANALYSIS

Xiaomi’s MiMo-V2.5-Pro Claims Frontier Agentic Scores, Available via API

A Anika Patel Apr 23, 2026 3 min read
Engine Score 8/10 — Important
Editorial illustration for: Xiaomi's MiMo-V2.5-Pro Claims Frontier Agentic Scores, Available via API
  • Xiaomi’s MiMo team released MiMo-V2.5-Pro and MiMo-V2.5 on April 22, 2026, both available immediately via API at competitive pricing.
  • MiMo-V2.5-Pro scored 57.2 on SWE-bench Pro, 63.8 on Claw-Eval, and 72.9 on τ3-Bench — scores Xiaomi claims place it alongside Claude Opus 4.6 and GPT-5.4.
  • The models are open-weights and capable of sustaining agentic tasks spanning more than a thousand tool calls in a single session.
  • Xiaomi describes a property called “harness awareness” enabling the model to manage its own memory and context during multi-step execution.

What Happened

Xiaomi’s MiMo team publicly released two open-weights models on April 22, 2026: MiMo-V2.5-Pro and MiMo-V2.5, targeting agentic AI tasks including multi-step software engineering, tool use, and long-horizon autonomous task completion. Both models are available immediately via API, according to reporting by MarkTechPost’s Asif Razzaq.

The flagship MiMo-V2.5-Pro is positioned as a significant upgrade over its predecessor, MiMo-V2-Pro, with the team claiming improvements across general agentic capability, complex software engineering, and long-horizon task performance. Exact pricing was not disclosed in the announcement.

Why It Matters

Agentic benchmarks measure whether a model can complete multi-step goals autonomously — using tools such as web search, code execution, and file I/O — rather than answering isolated single-turn queries. Benchmark parity with closed-source frontiers, if independently validated, would mark a notable advance for openly available agentic models.

The MiMo series follows a broader competitive dynamic in which open-weights releases have progressively narrowed the gap with proprietary systems. DeepSeek-R1 and the Qwen-series established earlier precedents for this pattern in reasoning-focused evaluations.

Technical Details

According to Xiaomi’s announcement, MiMo-V2.5-Pro scored 57.2 on SWE-bench Pro, 63.8 on Claw-Eval, and 72.9 on τ3-Bench — benchmarks designed to evaluate multi-step software engineering tasks and long-horizon agentic reasoning. The team claims these figures place V2.5-Pro alongside Claude Opus 4.6 and GPT-5.4 across most evaluations; these scores are self-reported and have not been independently reproduced at time of publication.

The model is designed to sustain tasks spanning more than a thousand tool calls within a single session. Xiaomi describes a behavioral property they call “harness awareness”: according to the team’s announcement, the model “makes full use of the affordances of its harness environment, manages its memory, and shapes how its own context is populated toward the final objective.”

Neither the parameter count, training dataset composition, nor full architectural details were disclosed in the release announcement.

Who’s Affected

The release is most directly relevant to developers and enterprises evaluating open-weights alternatives to proprietary API offerings from Anthropic, OpenAI, and Google for agentic workflows. Open weights combined with competitive API pricing may appeal to organizations with data-sovereignty requirements or budget constraints that preclude reliance on closed-source providers.

Software engineering tooling vendors and autonomous coding assistant providers are also in scope, given V2.5-Pro’s stated performance on SWE-bench Pro, an industry-standard benchmark for repository-level software engineering tasks.

What’s Next

Independent replication of the benchmark scores is the critical open question. Agentic benchmarks are sensitive to scaffolding configuration, execution environment, and evaluation harness design; SWE-bench Pro results in particular vary substantially depending on tool setup and retry policies, making self-reported figures difficult to compare without full methodology disclosure.

Xiaomi has not announced a technical report or model card detailing training methodology as of April 23, 2026. Third-party evaluations from research groups and commercial AI testing firms will determine how these claims hold under standardized conditions.

Share

Enjoyed this story?

Get articles like this delivered daily. The Engine Room — free AI intelligence newsletter.

Join 500+ AI professionals · No spam · Unsubscribe anytime