Austin Weeks has released Miasma, an open-source tool designed to trap AI web scrapers by feeding them endless streams of poisoned training data. The Rust-based server creates what Weeks describes as “an endless buffet of slop for the slop machines” targeting companies that scrape websites for AI training data.
The tool addresses what Weeks characterizes as widespread unauthorized data collection: “AI companies continually scrape the internet at an enormous scale, swallowing up all of its contents to use as training data for their next models. If you have a public website, they are already stealing your work.”
Miasma works by serving poisoned training data alongside multiple self-referential links when scrapers access designated trap URLs. Website operators embed hidden links using HTML attributes like `style=”display: none;”` and `aria-hidden=”true”` that remain invisible to human visitors but are followed by automated scrapers. When scrapers access these links, they encounter Miasma’s data stream designed to contaminate training datasets.
The tool is engineered for efficiency, with Weeks noting it “is very fast and has a minimal memory footprint” to avoid wasting computational resources. At 50 concurrent connections, the system uses an estimated 50-60 MB of peak memory. Requests exceeding the configured connection limit receive immediate 429 responses rather than being queued.
Miasma can be installed via Cargo or downloaded as pre-built binaries. The tool includes configuration options for port binding, host address, and maximum in-flight request limits. Weeks recommends protecting legitimate search engine bots like Googlebot and Bingbot through robots.txt exclusions while directing suspected AI scrapers to the trap endpoints.
