LeanMarathon: Multi-Agent System Solves Erdős Problems

Q: What happened?

Researchers Yuanhe Zhang, Yuekai Sun, Taiji Suzuki, Jason D. Lee, and Fanghui Liu introduced LeanMarathon (arXiv:2606.05400), submitted June 3, 2026. It targets a specific failure mode: long-horizon autoformalization of research mathematics that breaks down “at scale,” where “statements drift, dependencies tangle, context decays, and local repairs corrupt distant work.” The system formalizes research-level mathematics in Lean, the proof assistant used to verify mathematic

Q: What are the technical details?

The harness centers on an evolving blueprint: a Lean file serving at once as formal proof skeleton, natural-language proof graph, and shared system of record. Four contract-scoped agents construct, audit, prove, and repair it, coordinated by a two-stage orchestrator that first stabilizes target fidelity through adversarial review, then discharges the proof DAG in parallel, CI-gated rounds. Evaluated on two recent papers spanning four Erdős problems (#1051, #1196, #164, #1217), LeanMarathon form

LeanMarathon is a multi-agent harness for reliable research-level Lean autoformalization.
Its core abstraction is an “evolving blueprint” — a Lean file that is simultaneously proof skeleton, natural-language proof graph, and shared record.
Four contract-scoped agents construct, audit, prove, and repair the blueprint under a two-stage orchestrator.
Across three autonomous runs it formalized all seven target theorems across four Erdős problems with no “sorry,” proving 258 lemmas and theorems.

What Happened

Researchers Yuanhe Zhang, Yuekai Sun, Taiji Suzuki, Jason D. Lee, and Fanghui Liu introduced LeanMarathon (arXiv:2606.05400), submitted June 3, 2026. It targets a specific failure mode: long-horizon autoformalization of research mathematics that breaks down “at scale,” where “statements drift, dependencies tangle, context decays, and local repairs corrupt distant work.”

The system formalizes research-level mathematics in Lean, the proof assistant used to verify mathematical arguments mechanically.

Why It Matters

Autoformalization promises machine-checked mathematics, but reliability over long developments has been the barrier. LeanMarathon is part of the broader push toward durable long-horizon agents — the same reliability problem addressed from other angles by SentinelBench and PACT.

Technical Details

The harness centers on an evolving blueprint: a Lean file serving at once as formal proof skeleton, natural-language proof graph, and shared system of record. Four contract-scoped agents construct, audit, prove, and repair it, coordinated by a two-stage orchestrator that first stabilizes target fidelity through adversarial review, then discharges the proof DAG in parallel, CI-gated rounds.

Evaluated on two recent papers spanning four Erdős problems (#1051, #1196, #164, #1217), LeanMarathon formalized all seven target theorems with no “sorry” across three autonomous runs, proving 258 lemmas and theorems.

Who’s Affected

Mathematicians and formal-methods researchers are the direct audience, along with AI labs building systems for verified reasoning. The authors argue reliable AI co-mathematics requires “not only stronger provers, but durable harnesses that preserve target fidelity.”

What’s Next

The evaluation covers a small set of problems from two papers, so generalization to broader research mathematics is unproven. The code is publicly released, which enables independent testing on additional theorems.

LeanMarathon: A Multi-Agent System Formalizes Four Erdős Problems in Lean

What Happened

Why It Matters

Technical Details

Who’s Affected

What’s Next

Related Reading

Enjoyed this story?

LeanMarathon: A Multi-Agent System Formalizes Four Erdős Problems in Lean

What Happened

Why It Matters

Technical Details

Who’s Affected

What’s Next

Related Reading

Enjoyed this story?

Anthropic Commits $10M CAD to Canadian AI Research Institutes

Alberta Government Scanned 466M Lines of Code With Claude in 20 Hours

OpenAI’s Head of Safety Departs Amid a Broader Safety-Team Exodus