Mimosa Multi-Agent Framework Achieves 43.1% on ScienceAgentBench

A team of eight researchers introduced Mimosa, an open-source multi-agent framework designed to overcome the fixed-workflow limitations of current Autonomous Scientific Research (ASR) systems. Submitted to arXiv on 30 March 2026 (arXiv:2603.28986), the paper describes a system that automatically synthesizes task-specific agent workflows and refines them iteratively based on execution feedback — moving beyond the static pipelines that constrain most existing ASR systems.

Mimosa achieved a 43.1% success rate on ScienceAgentBench with DeepSeek-V3.2, outperforming both single-agent baselines and static multi-agent configurations.
The framework uses the Model Context Protocol (MCP) for dynamic tool discovery, enabling agents to identify and invoke available tools at runtime rather than from a hardcoded list.
A meta-orchestrator generates workflow topologies per task; an LLM-based judge scores execution results and drives iterative refinement across runs.
Released as fully open-source, with complete execution trace logging designed for auditability and replication.

What Happened

Martin Legrand, Tao Jiang, Matthieu Feraud, Benjamin Navet, Yousouf Taghzouti, Fabien Gandon, Elise Dumont, and Louis-Félix Nothias published Mimosa on 30 March 2026 through arXiv. The paper’s central argument is that current ASR systems, while built on capable LLMs, are “constrained by fixed workflows and toolsets that prevent adaptation to evolving tasks and environments.” Mimosa proposes a framework where both the workflow structure and tool selection are generated and refined dynamically, rather than specified in advance by system designers.

Why It Matters

Most existing agentic research systems assign agent roles and execution pipelines before a run begins. When a task deviates from what the pipeline was designed for, performance degrades and the system cannot self-correct. Several prior approaches have attempted partial solutions — reflection loops, tool-augmented reasoning — but these typically still operate within a fixed agent topology.

Mimosa takes a different approach by treating the workflow itself as something to be generated and improved over time. The authors describe the system as capable of “automatically synthesizing task-specific multi-agent workflows and iteratively refining them through experimental feedback,” positioning it as one of the first published frameworks to combine dynamic topology generation with iterative refinement specifically in the context of scientific research tasks.

Technical Details

Mimosa’s architecture centers on four interconnected processes. A meta-orchestrator takes a research task as input and generates a workflow topology — specifying which agents handle which subtasks and how they interact. Code-generating agents then execute those subtasks by invoking tools and scientific software libraries. An LLM-based judge evaluates the execution results. The judge’s scores and feedback are then used to refine the workflow in subsequent iterations.

For tool access, Mimosa uses the Model Context Protocol (MCP), which enables dynamic tool discovery at runtime. Rather than relying on a hardcoded toolset, agents query for available tools during execution, allowing the framework to incorporate new instruments or software libraries without changes to its core architecture.

On ScienceAgentBench, Mimosa achieved a 43.1% success rate when paired with DeepSeek-V3.2, surpassing both single-agent baselines and static multi-agent configurations on the same benchmark. The authors also report that “models respond heterogeneously to multi-agent decomposition and iterative learning” — the performance gains from workflow evolution are not uniform across underlying models and depend on those models’ inherent capabilities. The framework logs every execution step in full: archived traces preserve each analytical decision for inspection and potential replication by other researchers.

Who’s Affected

The immediate audience is academic researchers working on computationally intensive scientific tasks — including data analysis, hypothesis testing, and experiment automation across disciplines. Because Mimosa is fully open-source, individual labs and institutions can deploy it without licensing constraints or dependency on a commercial vendor.

Developers building agentic AI tooling will find the MCP integration directly relevant. MCP is gaining adoption as a protocol for standardizing how agents discover and invoke external tools; Mimosa’s use of it connects the framework to that broader ecosystem. The modular architecture also allows components — including the judge, orchestrator, or individual agents — to be replaced or extended independently.

What’s Next

The paper’s benchmark results are currently limited to ScienceAgentBench; broader evaluation across additional scientific task suites has not yet been published, which limits conclusions about how far the framework generalizes. The authors’ finding that underlying model capability determines how much Mimosa’s workflow evolution helps also suggests further characterization work is needed to identify which model families benefit most from the framework’s iterative approach.

The open-source release is intended as a foundation for community-driven development in autonomous scientific research. The authors state the framework, combined with domain-expert guidance, has potential to automate a broad range of computationally accessible scientific tasks across disciplines, though the paper does not specify production deployment criteria beyond the benchmark evaluation.

Mimosa Multi-Agent Framework Achieves 43.1% on ScienceAgentBench

What Happened

Why It Matters

Technical Details

Who’s Affected

What’s Next

Enjoyed this story?

OpenAI’s Ads Made $100 Million in 6 Weeks — It Took Google Years to Hit That Number

OpenAI Has 900 Million Weekly Users — More Than Instagram, Less Than YouTube, Growing Faster Than Both

Microsoft’s AI Chief Says Rewriting the OpenAI Contract ‘Unlocked Superintelligence’ — Here’s What That Means

Before you go…