- Fields Medalist Timothy Gowers wrote on his blog that ChatGPT 5.5 Pro produced doctoral-level mathematical research on open problems from a Mel Nathanson paper in under two hours, with “zero” mathematical contribution from Gowers.
- ChatGPT 5.5 Pro improved Nathanson’s exponential bound to a quadratic bound in 17 minutes 5 seconds, then rewrote the argument as a LaTeX preprint in 2 minutes 23 seconds.
- On a generalized version with prior work by MIT student Isaac Rajagopal, the model improved Rajagopal’s exponential dependency to polynomial in roughly 31 minutes.
- Rajagopal called the key idea “completely original” — “the sort of idea I would be very proud to come up with after a week or two of pondering.”
What Happened
Fields Medalist Timothy Gowers — Combinatorics Chair at the Collège de France and Fellow at Trinity College Cambridge — wrote on his blog that ChatGPT 5.5 Pro produced doctoral-level mathematical research in under two hours, with zero mathematical contribution from Gowers himself. “I didn’t even do anything clever with the prompts,” Gowers writes. The Decoder summarized the post on May 9, 2026.
Why It Matters
This is the first publicly disclosed case of a Fields Medalist verifying that a frontier AI model produced original, non-trivial mathematical research at PhD level. Gowers’s framing matters: he draws “far-reaching conclusions” — “The lower bound for contributing to mathematics will now be to prove something that LLMs can’t prove, rather than simply to prove something that nobody has proved up to now and that at least somebody finds interesting.” The benchmark for human mathematical contribution shifts from “novel” to “novel and beyond LLM capability.”
Technical Details
Gowers fed ChatGPT 5.5 Pro open problems from a number-theory paper by Mel Nathanson investigating possible sizes of certain sets of integer sums and how efficiently sets with prescribed properties can be constructed. Nathanson had proved an exponential bound for one problem and asked whether it could be improved.
The first result: ChatGPT 5.5 Pro thought for 17 minutes 5 seconds, then delivered the best possible construction with a quadratic bound. The core idea swapped a component in Nathanson’s proof for a more efficient variant well-known in combinatorics but whose application to this particular problem wasn’t obvious. When asked, ChatGPT rewrote the argument as a LaTeX preprint in 2 minutes 23 seconds. Gowers checked it for correctness, then had the model solve a related variant, which it handled without issues. Both results are available as a preprint.
The harder generalized version: prior work by Isaac Rajagopal, an MIT student, had proven an exponential dependency. Gowers gave ChatGPT Rajagopal’s paper and asked for an improvement. After 16 minutes 41 seconds, the model delivered a first improvement — Rajagopal called this step “correct” but a “routine modification” of his own work. Gowers got “greedy” and asked for a much stronger bound. After 13 minutes 33 seconds, the model reported optimism but said two technical statements still needed checking. Another 9 minutes 12 seconds later, the check was done. The finished preprint was ready in 31 minutes 40 seconds total. The model improved the bound from exponential to polynomial.
Rajagopal’s nuanced assessment: the first improvement was a routine modification, but the polynomial-bound improvement was “quite impressive” and the key idea was “quite ingenious” — the model found a counterintuitive way to compress certain algebraic structures into a much smaller number range without losing combinatorial properties. “It is the sort of idea I would be very proud to come up with after a week or two of pondering, and it took ChatGPT less than an hour to find and prove, using similar methods to those in my own proof,” Rajagopal writes. As far as he could tell, the idea was “completely original.”
Gowers’s overall assessment: the result sits “at the level of a perfectly reasonable chapter in a combinatorics PhD.” Not “amazing” — it builds heavily on Rajagopal’s ideas — but “definitely a non-trivial extension.” For a PhD student, working through Rajagopal’s paper to identify weaknesses and adapt techniques would have taken considerable time.
Who’s Affected
Mathematics PhD students and researchers face a recalibrated bar: producing work that AI models cannot replicate becomes the standard for publishable contribution rather than producing work nobody has done before. OpenAI gains a high-profile validation of ChatGPT 5.5 Pro’s reasoning capability from one of the most credible mathematicians of the past two decades. Anthropic and Google DeepMind face a benchmark to match — whether Claude Opus 4.7 / Mythos and Gemini 3.1 Pro reach the same standard on similar open problems is the implicit comparison. The arXiv math community gains a concrete test case for AI-authored math research and the editorial-policy questions that follow.
What’s Next
The two preprints Gowers produced will surface on arXiv with the AI authorship attribution clear. Other senior mathematicians are likely to attempt similar experiments on their own open problems, expanding the empirical base. Mathematics journals will face the editorial question of whether AI-co-authored proofs are acceptable for submission and how to verify them. Gowers’s qualifier — that PhD students could use LLMs as a tool, with the real task being to create something in collaboration — suggests the discipline-level recalibration will play out across mentorship, training, and credit assignment over the coming year.