RESEARCH

LeanMarathon: A Multi-Agent System Formalizes Four Erdős Problems in Lean

J James Whitfield Jun 8, 2026 2 min read
Engine Score 7/10 — Important

tier-2 research

Editorial illustration for: LeanMarathon: A Multi-Agent System Formalizes Four Erdős Problems in Lean
  • LeanMarathon is a multi-agent harness for reliable research-level Lean autoformalization.
  • Its core abstraction is an “evolving blueprint” — a Lean file that is simultaneously proof skeleton, natural-language proof graph, and shared record.
  • Four contract-scoped agents construct, audit, prove, and repair the blueprint under a two-stage orchestrator.
  • Across three autonomous runs it formalized all seven target theorems across four Erdős problems with no “sorry,” proving 258 lemmas and theorems.

What Happened

Researchers Yuanhe Zhang, Yuekai Sun, Taiji Suzuki, Jason D. Lee, and Fanghui Liu introduced LeanMarathon (arXiv:2606.05400), submitted June 3, 2026. It targets a specific failure mode: long-horizon autoformalization of research mathematics that breaks down “at scale,” where “statements drift, dependencies tangle, context decays, and local repairs corrupt distant work.”

The system formalizes research-level mathematics in Lean, the proof assistant used to verify mathematical arguments mechanically.

Why It Matters

Autoformalization promises machine-checked mathematics, but reliability over long developments has been the barrier. LeanMarathon is part of the broader push toward durable long-horizon agents — the same reliability problem addressed from other angles by SentinelBench and PACT.

Technical Details

The harness centers on an evolving blueprint: a Lean file serving at once as formal proof skeleton, natural-language proof graph, and shared system of record. Four contract-scoped agents construct, audit, prove, and repair it, coordinated by a two-stage orchestrator that first stabilizes target fidelity through adversarial review, then discharges the proof DAG in parallel, CI-gated rounds.

Evaluated on two recent papers spanning four Erdős problems (#1051, #1196, #164, #1217), LeanMarathon formalized all seven target theorems with no “sorry” across three autonomous runs, proving 258 lemmas and theorems.

Who’s Affected

Mathematicians and formal-methods researchers are the direct audience, along with AI labs building systems for verified reasoning. The authors argue reliable AI co-mathematics requires “not only stronger provers, but durable harnesses that preserve target fidelity.”

What’s Next

The evaluation covers a small set of problems from two papers, so generalization to broader research mathematics is unproven. The code is publicly released, which enables independent testing on additional theorems.

Related Reading

Share

Enjoyed this story?

Get articles like this delivered daily. The Engine Room — free AI intelligence newsletter.

Join 500+ AI professionals · No spam · Unsubscribe anytime