RESEARCH

Microsoft’s MDASH Uses 100+ AI Agents to Find Windows Vulnerabilities, Tops CyberGym Benchmark

J James Whitfield May 14, 2026 3 min read
Engine Score 7/10 — Important

tier-1 analysis

Editorial illustration for: Microsoft's MDASH Uses 100+ AI Agents to Find Windows Vulnerabilities, Tops CyberGym Benchmark
  • Microsoft introduced MDASH (Multi-Model Agentic Scanning Harness), an AI-powered security system using more than 100 specialised agents to find vulnerabilities.
  • MDASH discovered 16 new Windows CVEs on Patch Tuesday May 12, 2026, with four classified as critical.
  • The system scored 88.45% on the CyberGym benchmark — the highest result to date, per Microsoft.
  • The four critical CVEs include remote code execution flaws in tcpip.sys, ikeext.dll, netlogon.dll, and dnsapi.dll.

What Happened

Microsoft has built an agentic multi-model security system that uses more than 100 specialised AI agents to automatically detect software vulnerabilities, The Decoder reported on Thursday. The system, called MDASH (Multi-Model Agentic Scanning Harness), differs from single-model approaches such as Anthropic’s Claude Mythos vulnerability-finding capabilities by orchestrating more than 100 specialised AI agents across an ensemble of frontier and distilled models.

Why It Matters

MDASH represents a substantial step in operationalising agentic AI for offensive- and defensive-security workflows. On Patch Tuesday, May 12, 2026, Microsoft reported 16 new vulnerabilities (CVEs) in the Windows networking and authentication stack that MDASH discovered, with four classified as critical. The critical flaws include remote-code-execution vulnerabilities in the tcpip.sys kernel component, the IKEv2 service (ikeext.dll), netlogon.dll, and dnsapi.dll. Ten of the 16 vulnerabilities affect kernel mode, and most are accessible from the network without authentication, per Microsoft’s reporting.

The framing is notable. Microsoft observes that its own code base is especially hard to audit because Windows, Hyper-V, and Azure are proprietary and are not part of public LLM training data — which makes a multi-agent collaborative system more valuable than one assuming model familiarity with the source.

Technical Details

MDASH operates as a four-stage pipeline. First, the system analyses the source code and maps the attack surface. Second, specialised auditor agents scan the code for suspicious areas. Third, a separate group of agents — Microsoft calls them “debaters” — argue for and against the exploitability of each candidate finding. Duplicates are then merged. Fourth, Evidence Leader agents attempt to trigger the vulnerability through specific crafted inputs. The pipeline is model-agnostic: any frontier or distilled model with the requisite reasoning and code-comprehension capability can be plugged in. Microsoft has not publicly disclosed which specific AI models power MDASH.

On CyberGym, the publicly comparable benchmark for agentic-security capability, MDASH scored 88.45%, which Microsoft positions as the highest reported result to date. The closest single-model results have been from Claude Mythos and OpenAI’s GPT-5.5, though direct comparisons across multi-agent and single-model architectures are imperfect.

Who’s Affected

Microsoft customers benefit from the immediate patches landing in the May Patch Tuesday release. The broader security community gains a documented multi-agent architecture template for vulnerability discovery. Competing security-tooling companies — Sentinel One, CrowdStrike, Palo Alto Networks, plus traditional fuzzing-based static-analysis vendors — face pressure to integrate similar agentic approaches. Anthropic’s Mythos vulnerability-finding capabilities, OpenAI’s Codex-based research stack, and Google DeepMind’s parallel agentic security work all sit in adjacent positioning. The wider AI-safety debate over agentic models being used for offensive cybersecurity gains another data point: the same multi-agent architecture that finds defensive vulnerabilities could, with different incentive design, be used offensively.

What’s Next

Microsoft has stated it will continue to expand MDASH across additional code bases beyond the initial Windows networking and authentication focus. Hyper-V, Azure, and Microsoft 365 are likely next targets. The 88.45% CyberGym score will face renewed competition as Anthropic, OpenAI, and Google DeepMind iterate their own vulnerability-finding pipelines. Industry working groups around agentic security testing standards — including the Cloud Security Alliance and ISO/IEC working groups on AI-security evaluation — will likely cite MDASH in upcoming reports.

Share

Enjoyed this story?

Get articles like this delivered daily. The Engine Room — free AI intelligence newsletter.

Join 500+ AI professionals · No spam · Unsubscribe anytime