ANALYSIS

Small Open Models Replicate Core Claude Mythos Cybersecurity Bug Finds

A Anika Patel Apr 18, 2026 4 min read
Engine Score 8/10 — Important
Editorial illustration for: Small Open Models Replicate Core Claude Mythos Cybersecurity Bug Finds
  • AISLE found all eight models it tested — including a 3.6-billion-parameter model costing $0.11 per million tokens — flagged the same FreeBSD NFS memory bug Anthropic’s restricted Claude Mythos model highlighted.
  • No tested model reproduced Mythos’s specific technique of splitting a payload across 15 separate network requests, though several found alternative exploitation paths.
  • Results varied sharply by task: Claude Opus 4.6 caught an OpenBSD integer-overflow bug three out of three times while GPT-5.4 missed it in every run.
  • Both research teams conclude the decisive advantage in AI-assisted vulnerability hunting lies in system-level validation, not model capability alone.

What Happened

Two independent research efforts have tested small and mid-size language models against cybersecurity vulnerabilities that Anthropic’s restricted Claude Mythos model had spotlighted, finding that most models could replicate the core bug-finding results. The Decoder reported on both studies on April 18, 2026. The first comes from AISLE, a company conducting AI-assisted bug hunting since mid-2025; the second from Vidoc Security, which paired commercial models with the open coding agent OpenCode.

Anthropic has limited access to Claude Mythos Preview to eleven organizations through Project Glasswing, citing the model’s offensive capabilities. An audit by the UK’s AI Security Institute confirmed that Mythos can find software bugs, build working exploits autonomously, and simulate corporate network takeovers — provided the network is “small, weakly defended and vulnerable.”

Why It Matters

The studies probe a specific claim embedded in Anthropic’s restricted-access approach: that Mythos possesses offensive cybersecurity capabilities qualitatively different from what is publicly available. The Financial Times, citing “multiple people with knowledge of the matter,” separately reported that Anthropic is holding the model back until it has sufficient compute capacity to serve customers broadly — adding a commercial dimension to the access restriction narrative.

AI-assisted vulnerability research has accelerated since 2025, with multiple organizations now using automated pipelines to scan open source codebases at scale. AISLE says it has already reported 15 vulnerabilities in OpenSSL and five in curl through its AI-assisted hunting operations.

Technical Details

AISLE founder Stanislav Fort fed code snippets from Anthropic’s public samples into eight models and tested them against CVE-2026-4747, a FreeBSD NFS memory bug Anthropic highlighted as a Mythos showcase. All eight flagged the flaw as critical, including GPT-OSS-20b — a model with just 3.6 billion active parameters running at $0.11 per million tokens. Kimi K2 independently determined the bug could propagate automatically from one infected machine to others, a detail Anthropic had not publicized.

The real exploit requires fitting a payload exceeding 1,000 bytes into approximately 304 bytes of available space. Mythos accomplished this by splitting the payload across 15 separate network requests. None of the tested models replicated that specific technique, though the researchers say several found other workable paths. GPT-OSS-120b produced a gadget sequence AISLE describes as close to the actual exploit.

Results diverged sharply on an OpenBSD integer-overflow bug requiring a mathematical grasp of list states. GPT-OSS-120b reconstructed the full publicly described exploit chain in a single run and proposed the actual OpenBSD patch as the fix. Qwen3 32B — which had caught the FreeBSD bug without difficulty — assessed the OpenBSD code as “robust to such scenarios.” Vidoc found a comparable split: Claude Opus 4.6 reproduced the OpenBSD vulnerability in three out of three runs, while GPT-5.4 failed in every attempt. Fort describes this pattern as “the jagged frontier,” an uneven capability boundary that shifts sharply depending on the task.

A false-positive test added another dimension. Fort presented models with code that appears vulnerable to SQL injection but discards the user input before it reaches the database query. Of 13 Anthropic models tested, Opus 4.6 correctly identified the non-vulnerability; Claude Sonnet 4.5 traced the data flow incorrectly. Deepseek R1 and Kimi K2 were correct in every run, while most GPT-5.4 variants came up short. Vidoc’s per-file scanning cost on the Botan certificate-validation flaw and a wolfSSL cryptography case came in under $30.

Fort also flagged a significant false-negative problem: GPT-OSS-20b, Kimi K2, and Deepseek R1 — which performed well on unpatched code — flagged the patched FreeBSD version as still vulnerable in every run, inventing reasons for phantom bugs that no longer existed. Only GPT-OSS-120b and, to a limited extent, Qwen3-32B correctly recognized the patched version as safe.

Who’s Affected

Security research teams and organizations outside Project Glasswing’s eleven-member consortium may find the studies relevant to decisions about whether to build AI-assisted vulnerability scanning pipelines using publicly available models. The research suggests that broad scanning at commercial scale does not require access to frontier restricted models for most discovery-phase work.

For Anthropic, the findings complicate the public rationale for limiting Mythos access. Both studies stop short of disputing Mythos’s overall performance — particularly in building deployable exploits — but they narrow the scope of what capabilities are demonstrably exclusive to the model at this time.

What’s Next

Both research teams argue the decisive factor in AI-assisted vulnerability hunting is not which model is used but the system built around it — validation pipelines, prioritization logic, and false-positive filtering. Fort summarizes the practical implication: “A thousand adequate detectives searching everywhere will find more bugs than one brilliant detective who has to guess where to look.”

Both studies leave open the possibility that Mythos retains a meaningful edge in constructing deployable exploits end-to-end, and both suggest that gap is likely to narrow as models gain more autonomy and tooling matures. Anthropic had not publicly responded to either study as of April 18, 2026.

Share

Enjoyed this story?

Get articles like this delivered daily. The Engine Room — free AI intelligence newsletter.

Join 500+ AI professionals · No spam · Unsubscribe anytime