ANALYSIS

Microsoft rolls out Copilot Cowork more broadly and lets AI models check each other’s work

M MegaOne AI Mar 31, 2026 Updated Apr 2, 2026 3 min read
Engine Score 5/10 — Notable
Editorial illustration for: Microsoft rolls out Copilot Cowork more broadly and lets AI models check each other's work
  • Microsoft launched Copilot Cowork and a Critique function on March 30, 2026, enabling one AI model to draft research and a second model to review it for accuracy before delivery.
  • Critique pairs OpenAI’s GPT for content generation with Anthropic’s Claude for independent review, separating writing from evaluation in a two-stage pipeline.
  • Microsoft reports a 13.8-point improvement on the DRACO benchmark, which measures accuracy, completeness, and objectivity in research outputs.
  • A separate Model Council feature runs multiple AI models simultaneously on the same query, with a third model summarizing areas of agreement and disagreement.

What Happened

Microsoft announced the broader rollout of Copilot Cowork and a new Critique function for its Researcher tool on March 30, 2026, as part of Wave 3 of Microsoft 365 Copilot. Copilot Cowork allows the AI system to independently manage multi-step workflows across Microsoft 365, including accessing files, managing calendars, compiling daily briefings, and executing tasks across Outlook, Teams, and Excel.

The Critique function introduces a two-stage review process where one AI model generates a draft and a second, independent model reviews it. Microsoft stated that the feature “separates generation from evaluation and utilises a combination of models from Frontier labs, including Anthropic and OpenAI.”

Why It Matters

Single-model AI systems have a well-documented tendency to produce confident but inaccurate output, particularly in research and analysis tasks. By routing generated content through an independent reviewer from a different model provider, Microsoft is addressing one of the most persistent complaints about AI assistants in enterprise settings: unreliable facts presented as authoritative. The separation of generation and evaluation reduces the likelihood that a single model’s blind spots pass through unchecked.

The multi-model approach also represents a shift in how major tech companies deploy AI. Rather than betting on a single model provider, Microsoft is combining OpenAI’s GPT for generation with Anthropic’s Claude for evaluation, treating each model’s strengths as complementary rather than competing. This is the first major productivity suite to ship cross-provider AI verification as a default feature.

Technical Details

In the Critique pipeline, GPT drafts a response to a research query. Claude then reviews the draft for accuracy, completeness, and citation quality before the response reaches the user. Critique activates by default in Researcher when users select “Auto” in the model picker. Microsoft reports that Researcher with Critique scores 57.4 on the DRACO benchmark, a 13.8-point improvement over the previous version. DRACO measures accuracy, completeness, and objectivity in research outputs.

The company claims the new Researcher outperforms Perplexity with Claude Opus 4.6 by 7 points in internal benchmarks, though no independent verification of these results has been published. A comparison against OpenAI’s GPT-5-based Deep Research or Google’s Gemini models was not included.

Model Council, a separate feature, runs models from both Anthropic and OpenAI simultaneously on identical queries and generates full reports from each. A third “judge” model then summarizes key findings, highlighting areas of agreement, disagreement, and unique insights from each model.

Who’s Affected

Copilot Cowork is currently available through Microsoft’s Frontier early-access program, which began enrolling customers on March 9, 2026. The feature is not yet available to all Microsoft 365 subscribers. Enterprise customers using Copilot for research-heavy tasks in legal, financial, and consulting workflows stand to benefit most from the accuracy improvements that come from cross-model verification.

Individual knowledge workers who rely on Copilot for daily briefings, email summarization, and calendar management gain access to autonomous task execution through Cowork, reducing the number of manual steps required to complete routine workflows.

The current version of Copilot Cowork lacks local computer use and third-party integrations found in standalone Claude Cowork, limiting its scope to tasks within the Microsoft 365 ecosystem.

What’s Next

Microsoft has not announced a timeline for general availability beyond the Frontier program. The DRACO benchmark results remain self-reported, and the absence of comparisons against Google’s Gemini and OpenAI’s latest GPT-5-based Deep Research tool makes independent performance assessment difficult.

The multi-model approach also raises questions about latency and cost. Running two frontier models sequentially on every research query doubles the compute requirements compared to single-model execution. Whether the accuracy gains justify the additional processing time and expense in real-world enterprise use, rather than controlled benchmark conditions, will determine long-term adoption of the Critique pattern.

Share

Enjoyed this story?

Get articles like this delivered daily. The Engine Room — free AI intelligence newsletter.

Join 500+ AI professionals · No spam · Unsubscribe anytime

M
MegaOne AI Editorial Team

MegaOne AI monitors 200+ sources daily to identify and score the most important AI developments. Our editorial team reviews 200+ sources with rigorous oversight to deliver accurate, scored coverage of the AI industry. Every story is fact-checked, linked to primary sources, and rated using our six-factor Engine Score methodology.

About Us Editorial Policy