Claude Mythos Clears UK AISI Cyber Simulations

The UK’s AI Security Institute (AISI) revised its forecast for AI cyber capabilities, now estimating a doubling time of 4.7 months versus the prior 8 months.
Anthropic’s Claude Mythos Preview and OpenAI‘s GPT-5.5 have “substantially exceeded” even that accelerated timeline.
Mythos Preview completed AISI’s 32-step corporate-network attack simulation in 6 out of 10 attempts.
It was the first model to crack the AISI “Cooling Tower” industrial control system simulation, succeeding in 3 out of 10 attempts.

What Happened

Anthropic’s Claude Mythos Preview has become the first AI model to clear both of the UK AI Security Institute’s cyber-range simulations, The Decoder reported on Thursday, citing the latest AISI assessment. The agency, which has been benchmarking frontier-model cyber capabilities throughout 2025 and 2026, also revised upward its estimate of the rate at which those capabilities are growing.

Why It Matters

AISI estimated in November 2025 that AI cyber capabilities were doubling every eight months. By February 2026, the agency revised that to 4.7 months. Anthropic’s Claude Mythos Preview and OpenAI’s GPT-5.5 have now substantially exceeded even that accelerated timeline, per AISI. Whether this represents a new structural trend or a single-cohort jump is, per AISI, unclear at this stage.

The Mythos Preview result is the first time any frontier model has cleared both AISI cyber ranges. The agency wrote, per The Decoder: “The direction of travel is clear: cyber capabilities are advancing rapidly, and recent models represent a meaningful step up from what came before.” AISI is now building harder evaluations that include active defences.

Technical Details

One AISI cyber range simulates a 32-step attack on a corporate network that human experts would need approximately 20 hours to complete. The latest Claude Mythos Preview checkpoint completed the full attack in 6 out of 10 attempts. This same checkpoint has been rolled out to AISI partners for further evaluation. The previously tested Mythos version managed it in 3 out of 10 attempts.

The second cyber range, “Cooling Tower,” simulates an industrial control system attack. Mythos Preview solved this in 3 out of 10 attempts; no other model — including the prior Mythos version — had ever cleared this simulation. AISI is now developing harder cyber ranges with active defences to push the evaluation beyond what current models can handle. Microsoft’s MDASH multi-agent system, reported the same day, achieved 88.45% on the separate CyberGym benchmark, suggesting comparable capability gains in a different architectural framing.

Who’s Affected

Defensive cybersecurity teams gain a clearer external benchmark for how rapidly offensive AI capability is advancing. Anthropic’s leadership and safety teams must now address how a model with these capabilities should be deployed and accessed; the company’s responsible-scaling policy commits to specific guardrails for cyber capabilities at this level. National AI safety institutes — the UK AISI, the U.S. AI Safety Institute (US AISI), and similar bodies in Singapore, Japan, and the EU — face increased pressure to publish more granular evaluation results.

What’s Next

AISI is building harder evaluations with active defences and is expected to release follow-up assessments later in 2026. Anthropic’s broader Claude Mythos general availability — beyond the Preview checkpoint — has not been announced. AISI’s broader framework for AI safety reporting may inform comparable national-agency assessments. The 4.7-month doubling-time estimate may be revised again if the next cohort of evaluations confirms the same pace of capability growth.

Claude Mythos Preview Becomes First AI Model to Clear All UK AISI Cyberattack Simulations

What Happened

Why It Matters

Technical Details

Who’s Affected

What’s Next

Enjoyed this story?

Claude Mythos Preview Becomes First AI Model to Clear All UK AISI Cyberattack Simulations

What Happened

Why It Matters

Technical Details

Who’s Affected

What’s Next

Enjoyed this story?

Claude Fable 5 Tops the Intelligence Index, at Twice the Cost for 5.7% More

SentinelBench Tests Whether AI Agents Can Wait Instead of Acting Constantly

ITBench-AA: Frontier Models Score Below 50% on Agentic SRE Tasks