- A new arXiv paper, submitted April 30, 2026, introduces an 11,500-query benchmark for studying generative search and reports four major findings on Google Search, AI Overviews (AIO), and Gemini Flash 2.5.
- AIOs are generated for 51.5% of representative real-user queries and are displayed above the organic search results.
- Retrieved sources differ substantially across surfaces — less than 0.2 average Jaccard similarity between traditional Search and generative results — with traditional Search favoring popular institutional sites and generative search favoring Google-owned content.
- Sites that block Google’s AI crawler are significantly less likely to be retrieved in AIOs even when AIOs technically have access to the content; AIOs are also less consistent run-to-run and less robust to minor query edits.
What Happened
An empirical study titled “How Generative AI Disrupts Search” was submitted to arXiv on April 30, 2026, introducing a public benchmark dataset of 11,500 user queries to compare Google’s traditional search engine, AI Overviews (AIO), and Gemini Flash 2.5. The paper is filed under cs.IR (Information Retrieval) and reports four findings on retrieval differences, sourcing behavior, AI-crawler-block effects, and result consistency.
Why It Matters
This is the largest publicly available, peer-comparable measurement of how generative search reshapes the information ecosystem. Earlier industry studies have inferred AIO behavior from traffic logs or sample queries; the 11,500-query benchmark gives independent researchers a reproducible substrate. The findings have direct commercial implications for publishers — including AI-news sites — whose visibility now depends on three different retrieval surfaces with substantially different source preferences. The crawler-block finding is particularly consequential: blocking Google’s AI crawler measurably reduces visibility in AIOs even when the underlying content remains accessible.
Technical Details
The paper reports four core findings. First, AIOs are generated for 51.5% of representative user queries and appear above organic search results when present; controversial questions trigger AIOs more frequently. Second, the retrieved sources are substantially different across surfaces — less than 0.2 average Jaccard similarity between Google Search results and generative search results, with traditional Search significantly more likely to retrieve information from popular or institutional sites in government or education domains, and generative search significantly more likely to retrieve Google-owned content.
Third, websites that block Google’s AI crawler are significantly less likely to be retrieved by AIOs even though Google’s AIO infrastructure technically has access to the content via the standard search index. The implication is that AI-crawler blocks operate as ranking signals beyond their literal content-access function. Fourth, AIOs are less consistent: running the same query twice can yield different AIO content, and minor query edits produce disproportionately different results — a robustness gap relative to traditional search.
The authors call for “revenue frameworks to foster a sustainable and mutually beneficial ecosystem for publishers and generative search providers,” framing the findings as evidence that current generative-search arrangements transfer attention and revenue away from publishers in ways traditional search did not.
Who’s Affected
Publishers — including news, education, and reference sites — face quantified evidence that AIO visibility differs from organic-search visibility, and that blocking Google’s AI crawler costs measurable visibility. SEO professionals gain a peer-reviewed dataset to ground generative-engine optimization (GEO) recommendations. Google itself faces additional regulatory and policy attention given the finding that AIOs preferentially surface Google-owned content. Competing search engines — Bing, Brave, DuckDuckGo, Kagi — have a reference benchmark that highlights the AIO consistency gap. Anthropic, OpenAI, and Perplexity, whose generative-search products were not included in this paper, gain a methodology that will likely be applied to their products in subsequent research.
What’s Next
The 11,500-query benchmark is publicly released, enabling independent replication and extension to ChatGPT search, Perplexity, Claude search, and Bing Copilot. Expect Google to publicly respond to the Google-owned-content sourcing finding, and watch for follow-on papers measuring how visibility in AIOs translates to actual referral traffic over time. Publisher coalitions in the EU and U.S. are likely to cite this paper in advocacy for the revenue-sharing frameworks the authors call for.