- The UK AI Security Institute (AISI) published cyber evaluations of Claude Mythos Preview on April 13, 2026, finding the model succeeded on 73% of expert-level capture-the-flag challenges—a category no model could crack before April 2025.
- Mythos Preview became the first AI model to complete "The Last Ones," a 32-step simulated corporate network attack estimated to take human experts 20 hours, finishing the full chain in 3 of 10 attempts.
- Across all runs, Mythos Preview averaged 22 of 32 attack steps completed, compared to 16 for Claude Opus 4.6, the next best performer.
- AISI cautioned that test environments lacked active defenders and detection tooling, making results not directly applicable to well-secured enterprise systems.
What Happened
The UK AI Security Institute (AISI) published cyber capability evaluations of Anthropic’s Claude Mythos Preview on April 13, 2026, six days after the model’s public announcement. The assessment covered two evaluation formats: a suite of capture-the-flag (CTF) challenges graded by difficulty tier, and a 32-step corporate network attack simulation called "The Last Ones" (TLO). AISI described Mythos Preview as representing "a step up over previous frontier models in a landscape where cyber performance was already rapidly improving."
Why It Matters
AISI has tracked AI cyber capabilities since 2023, building progressively harder evaluation formats—from chat-based probing to CTF challenges to multi-step attack simulations. The institute’s report noted that models as recently as two years ago "could barely complete beginner-level cyber tasks," while Mythos Preview can now, in controlled conditions, "execute multi-stage attacks on vulnerable networks and discover and exploit vulnerabilities autonomously—tasks that would take human professionals days of work." The report is the latest in a series of capability disclosures that have drawn attention from enterprise security teams and policymakers in the US, UK, and EU.
Technical Details
On expert-level CTF tasks—a tier in which no model had succeeded before April 2025—Mythos Preview achieved a 73% success rate. AISI’s CTF suite spans four difficulty levels: non-expert, apprentice, practitioner, and expert. Expert-level runs used token budgets up to 50 million tokens and were averaged across five runs per model, with earlier models in the benchmark series (GPT-3.5 Turbo through Claude 4 Opus) averaged across 10 runs with up to 2.5 million tokens.
On TLO, a 32-step attack chain spanning initial reconnaissance through full network takeover, Mythos Preview completed an average of 22 steps across all 10 attempts and solved the complete chain in 3 of those runs—the first model to do so. Claude Opus 4.6, the next best performer, averaged 16 completed steps. Cyber range evaluations used a 100 million token budget per run; AISI noted performance continued scaling up to that limit and stated it "expect[s] performance improvements would continue beyond that."
One notable gap: Mythos Preview could not complete AISI’s "Cooling Tower" operational technology (OT) cyber range, though AISI attributed the failure to the model stalling on IT-section steps rather than an inherent OT limitation. Across all ranges, evaluation environments did not include active defenders, endpoint detection tools, or penalties for triggering security alerts—conditions that differ substantially from hardened enterprise deployments. AISI stated it "cannot say for sure whether Mythos Preview would be able to attack well-defended systems."
Who’s Affected
Organizations operating systems with weak security posture face the most direct near-term exposure. AISI stated Mythos Preview is "at least capable of autonomously attacking small, weakly defended and vulnerable enterprise systems where access to a network has been gained." The institute directed organizations to the UK National Cyber Security Centre’s Cyber Essentials scheme, which covers security patch cadence, access controls, security configuration, and comprehensive logging practices.
Penetration testing firms and security researchers conducting authorized engagements may find the model’s demonstrated CTF and multi-step attack capabilities directly applicable. AISI, in a joint post with NCSC, has noted that the same AI cyber capabilities that pose offensive risks can be applied to defensive tasks including vulnerability discovery and automated red-teaming.
What’s Next
AISI stated its future evaluation work will move toward hardened, defended environments—incorporating active monitoring, endpoint detection, and real-time incident response—to better discriminate between frontier model capabilities as simulated ranges become insufficient baselines. The institute also plans to measure AI-enabled vulnerability discovery and penetration testing performance against real-world systems rather than isolated simulations.
Anthropic has not publicly announced a general availability date for Claude Mythos Preview. AISI noted that "future frontier models will be more capable still," and recommended organizations prioritize defensive investment rather than wait for a stable capability ceiling to emerge.