- Participants who used GPT-5 for 10–15 minutes on fraction problems scored significantly lower on unassisted follow-up tests than a control group with no AI access.
- Former AI users skipped problems at nearly twice the rate of participants who worked without assistance throughout the study.
- The performance deficit was largest among the 61 percent of AI users who primarily asked for direct answers, and replicated in a separate experiment using SAT reading comprehension passages.
- Researchers describe current user-side mitigations — such as Socratic AI design or time limits — as insufficient, calling instead for structural changes to how AI systems are built.
What Happened
Researchers at several American and British universities have published what they describe as the first large-scale, controlled experimental evidence that brief AI use measurably reduces subsequent problem-solving performance. In a study reported by The Decoder, participants who worked with an AI assistant for 10 to 15 minutes performed significantly worse on identical tasks completed without AI than a control group that had tackled the same problems unaided from the start.
The team ran two primary experiments — one on fraction arithmetic, one on SAT reading comprehension — and found the same pattern in both. Prior evidence for AI’s effect on cognitive performance has largely come from surveys and small sample sizes; the researchers assert this study provides causal evidence drawn from randomized controlled experiments.
Why It Matters
Earlier research has pointed toward similar conclusions but with weaker methodological foundations. A joint study by Microsoft Research and Carnegie Mellon University described an “irony of automation” in which AI tools, by handling routine tasks, prevent users from exercising what that research called their “cognitive muscles.” A separate Anthropic study involving 52 mostly junior software developers found that an AI-assisted group scored 17 percent lower on a follow-up programming knowledge test than a control group that relied on documentation and web search.
The new study advances on those findings by using a pre/post controlled design across two replication experiments, allowing the researchers to attribute the performance gap to AI exposure rather than pre-existing differences in ability or motivation.
Technical Details
In the primary experiment, participants solved 15 fraction problems ranging from single-step to three-step calculations. One group had GPT-5 available in a sidebar preloaded with each problem and its correct solution — submitting the text “Answer?” was sufficient to receive the answer. After 12 problems, AI access was removed without warning, and all participants independently solved three identical test problems.
On those unassisted problems, former AI users answered significantly fewer correctly and skipped them at nearly twice the rate of the control group. Because compensation was not tied to performance and no penalty applied to incorrect answers, the researchers treated problem-skipping as a direct behavioral measure of persistence and motivation.
Within the AI-access group, approximately 61 percent of participants reported primarily requesting direct answers. Around 25 percent used the tool for hints or explanations; the remainder did not use it at all. On post-AI testing, the direct-answer subgroup showed the sharpest performance decline and fell below their own pre-test scores. Participants who had ignored the available AI posted the highest solve rates — higher even than the control group. A follow-up experiment using SAT-format reading comprehension passages replicated the core finding: the AI group answered fewer questions correctly and skipped significantly more, with answers submitted in under five seconds counted as skips on the grounds that the passage cannot be read that quickly.
Who’s Affected
The researchers identify students with limited academic resources as facing the greatest risk of long-term harm. Fraction arithmetic and reading comprehension serve as prerequisites for higher-order skills including algebra and analytical writing; systematic offloading of this practice to AI tools could, the researchers argue, compound into hard-to-reverse deficits over months or years.
Products that embed conversational AI directly into educational settings — including OpenAI’s ChatGPT Edu tier, Google’s Gemini for Workspace in academic deployments, and Microsoft’s Copilot in Office 365 Education — represent the direct-use context the study’s experimental design most closely approximates.
What’s Next
The researchers characterize user-facing interventions such as Socratic AI design patterns or daily usage limits as “band-aids,” arguing that what is needed is a structural redesign of AI systems away from maximizing short-term user satisfaction and toward designs that preserve user autonomy and, when appropriate, deliberately withhold immediate answers.
As reported by The Decoder, the study does not yet specify a peer-reviewed publication venue, and individual author names and the paper’s formal title were not made available in the coverage. A link to a preprint was not accessible at the time of writing.