LAUNCHES

Developer Releases Miasma Tool to Trap AI Web Scrapers with Poisoned Data

M megaone_admin Mar 29, 2026 2 min read
Engine Score 8/10 — Important

This story introduces a highly actionable tool that addresses a significant industry concern regarding AI web scraping, offering a novel defense mechanism. Its direct utility and potential impact on content protection make it important.

Editorial illustration for: Developer Releases Miasma Tool to Trap AI Web Scrapers with Poisoned Data

Austin Weeks has released Miasma, an open-source tool designed to trap AI web scrapers by feeding them endless streams of poisoned training data. The Rust-based server creates what Weeks describes as “an endless buffet of slop for the slop machines” targeting companies that scrape websites for AI training data.

The tool addresses what Weeks characterizes as widespread unauthorized data collection: “AI companies continually scrape the internet at an enormous scale, swallowing up all of its contents to use as training data for their next models. If you have a public website, they are already stealing your work.”

Miasma works by serving poisoned training data alongside multiple self-referential links when scrapers access designated trap URLs. Website operators embed hidden links using HTML attributes like `style=”display: none;”` and `aria-hidden=”true”` that remain invisible to human visitors but are followed by automated scrapers. When scrapers access these links, they encounter Miasma’s data stream designed to contaminate training datasets.

The tool is engineered for efficiency, with Weeks noting it “is very fast and has a minimal memory footprint” to avoid wasting computational resources. At 50 concurrent connections, the system uses an estimated 50-60 MB of peak memory. Requests exceeding the configured connection limit receive immediate 429 responses rather than being queued.

Miasma can be installed via Cargo or downloaded as pre-built binaries. The tool includes configuration options for port binding, host address, and maximum in-flight request limits. Weeks recommends protecting legitimate search engine bots like Googlebot and Bingbot through robots.txt exclusions while directing suspected AI scrapers to the trap endpoints.

Share

Enjoyed this story?

Get articles like this delivered daily. The Engine Room — free AI intelligence newsletter.

Join 500+ AI professionals · No spam · Unsubscribe anytime

M
MegaOne AI Editorial Team

MegaOne AI monitors 200+ sources daily to identify and score the most important AI developments. Our editorial team reviews 200+ sources with rigorous oversight to deliver accurate, scored coverage of the AI industry. Every story is fact-checked, linked to primary sources, and rated using our six-factor Engine Score methodology.

About Us Editorial Policy