OpenAI Launches Safety Bug Bounty Program to Identify AI Abuse and Agentic Risks

OpenAI launched a Safety Bug Bounty program on March 25, 2026, expanding its vulnerability reporting framework beyond traditional security flaws to cover AI abuse patterns and safety risks that could cause real-world harm. The program complements the company’s existing Security Bug Bounty, which has been running since April 2023, and pays between $250 for lower-priority findings and $7,500 for critical safety issues.

The new program targets a category of problems that conventional security audits miss. A traditional bug bounty rewards researchers for finding exploits like SQL injection or authentication bypass. OpenAI’s Safety Bug Bounty instead rewards identification of ways the AI could be manipulated to produce harmful outcomes, even when no technical vulnerability exists. This includes prompt injection attacks that bypass safety filters, agentic workflows that take unintended actions, and systematic methods to extract harmful content.

For agentic risks specifically, where AI agents take actions in the real world, the program requires that submissions be reproducible at least 50 percent of the time. This threshold acknowledges that agentic systems are inherently less deterministic than traditional software, while still demanding consistency before paying a bounty. As AI agents gain capabilities like computer use, web browsing, and code execution, the surface area for unintended behavior expands dramatically.

The launch coincides with OpenAI’s broader safety push. The company’s Codex Security tool, released around March 9, identified over 11,000 high-severity and critical flaws including 792 critical vulnerabilities across 1.2 million scanned commits during its first 30 days of testing. These were found in widely used open-source projects including OpenSSH, GnuTLS, PHP, and Chromium. On March 24, OpenAI published a Teen Safety Policy Pack on GitHub with guidelines for developers building age-appropriate AI experiences.

The program is managed through Bugcrowd, the same platform handling OpenAI’s security program. The maximum payout for the security program was increased to $100,000 in March 2025, though the safety program’s current ceiling is lower at $7,500, a gap that may narrow as the program matures.

The program establishes a framework that other AI companies will likely need to replicate. As models become more capable and autonomous, the distinction between security vulnerabilities and safety risks blurs. A model that can be reliably manipulated into generating harmful medical advice represents a real risk that someone needs to find and report before it causes harm at scale.

OpenAI Launches Safety Bug Bounty Program to Identify AI Abuse and Agentic Risks

Enjoyed this story?

Bluesky Launches AI Assistant Attie for Custom Social Media Feed Creation

Chroma Releases Context-1: 20B Parameter Model for Multi-Hop Search

Developer Releases Miasma Tool to Trap AI Web Scrapers with Poisoned Data

Before you go…