- Anthropic has determined a newly developed AI model poses risks sufficient to withhold it from public release, according to reporting by The Hill.
- The decision is consistent with Anthropic’s Responsible Scaling Policy (RSP), which mandates halting deployment when a model crosses defined AI Safety Level thresholds.
- Anthropic’s ASL-3 designation applies to models that could provide meaningful uplift to actors seeking to create weapons capable of mass casualties, or that demonstrate autonomous replication and adaptation.
- The disclosure is among the first publicly confirmed cases of a frontier AI lab withholding a production-ready model specifically on safety grounds.
What Happened
Anthropic has decided not to release a newly developed AI model to the public after internal safety evaluations concluded the system presents risks the company is not yet equipped to adequately mitigate, The Hill reported on April 11, 2026. The company, founded by CEO Dario Amodei and President Daniela Amodei among others, has not disclosed the model’s name or the specific evaluation findings that triggered the withholding decision.
The announcement applies the core commitment Anthropic made in its Responsible Scaling Policy (RSP), first published in September 2023 and updated in May 2024, which states the company will not deploy models that exceed defined safety thresholds without first implementing corresponding safeguards.
Why It Matters
Major AI labs including OpenAI, Google DeepMind, and Anthropic have each published voluntary safety frameworks over the past two years, but publicly confirmed cases of a lab withholding an otherwise deployable model on safety grounds have been extremely rare. Anthropic’s RSP was explicitly designed to create a binding internal obligation—not merely an aspiration—to pause development or deployment when capability thresholds are met.
The decision arrives as AI capabilities have accelerated sharply in 2025 and 2026, with multiple frontier labs shipping models that demonstrate autonomous reasoning, long-horizon task completion, and code execution at a level not seen in prior generations. The question of whether lab safety commitments would hold under competitive pressure has been a persistent concern among researchers and policymakers.
Technical Details
Anthropic’s RSP defines four AI Safety Levels. ASL-2 applies to current deployed systems. ASL-3 is triggered when a model is assessed to provide “serious uplift” to individuals or groups seeking to create biological, chemical, nuclear, or radiological weapons capable of mass casualties, or when the model demonstrates meaningful autonomous replication—the ability to acquire resources, evade shutdown, or spawn copies of itself without human assistance.
Under the RSP’s own terms, a model assessed at ASL-3 or above cannot be released until Anthropic has implemented “ASL-3 security” measures, which include significantly tighter access controls, enhanced model weight protection, and additional deployment-time monitoring. The RSP states: “We commit that we will not deploy models or allow training runs of models that we believe have crossed the ASL-3 threshold without first implementing the corresponding ASL-3 safety and security measures.”
Whether the withheld model was assessed at ASL-3 specifically, or flagged under a separate internal evaluation criterion, has not been confirmed by Anthropic as of this report.
Who’s Affected
Enterprise API customers, independent developers, and academic researchers who rely on Anthropic’s model releases will not have access to this system in its current form. Companies building products on Anthropic’s API—including those in healthcare, legal, and financial services that have adopted Claude-based tools—will continue using previously released models.
Competitors at OpenAI, Google DeepMind, Meta AI, and xAI may face renewed scrutiny over whether their own safety policies contain enforceable withholding mechanisms, or function primarily as communications frameworks. Regulatory bodies in the EU, UK, and United States have each proposed or enacted measures that would require frontier labs to disclose capability evaluation results to governments before public release.
What’s Next
Anthropic has indicated, consistent with its RSP commitments, that it will continue developing safety mitigations that could eventually allow the model to be released with adequate guardrails in place. The RSP does not set a timeline for how long a model may be held, only that the corresponding safety tier must be satisfied before deployment proceeds.
The company is expected to publish a model card or safety evaluation summary as more details become available. Whether this decision will prompt other frontier labs to conduct and disclose similar withholding assessments is not yet clear.