I Analysed 300 AI Overviews: Who Gets Picked

I got tired of arguing about AI Overviews from theory, so I spent three weeks collecting data instead. The argument that triggered it was familiar: a colleague insisted citations were "basically just the top three results with extra steps," I insisted passage structure was doing real selection work, and neither of us had numbers from our own market — only second-hand statistics from studies of other industries, other query types, other months. In a feature this volatile, borrowed data ages fast and generalizes badly.

So I built a sample. Three hundred queries from our niche — marketing software and SEO tooling, the market we actually compete in — checked systematically for AI Overview presence, with every citation logged, classified, and cross-referenced against organic rankings. This article is the full write-up: methodology first, so you can judge how much to trust it and replicate it in your own niche, then the findings, including the two that genuinely changed how I plan content. The headline, if you want it now: rankings get you into the casting call, but they do not pick the cast.

Who gets cited in AI Overviews? In our 300-query study, 74% of citations went to pages ranking in the organic top ten for the query or a close variant — but among those, sources with answer-first passage structure and first-party data were cited at two to three times the rate their rankings alone predicted.

Methodology: what I actually did

Credibility in a study like this lives or dies on method, so here it is in enough detail to replicate.

Query selection. I took 300 queries from our own niche, drawn from three sources in roughly equal parts: queries where our site already gets impressions, queries our four closest competitors visibly target, and question-format queries harvested from autocomplete and related-questions features around our core topics. I deliberately stratified by intent: 140 informational ("what is," "how does," "why"), 100 commercial-investigation ("best," "vs," "alternatives," "is X worth it"), and 60 task-oriented how-tos. This is not a random sample of the internet. It is a census of one niche, which is precisely the point — and the limitation.

Collection. Each query was checked three times across a two-week window, from a clean browser profile, same country, logged out, to dampen personalization and regeneration variance. For every check I recorded whether an overview appeared, every cited source (both inline and panel), and the top ten organic results below it. Three checks is a floor, not a luxury — single-snapshot studies of this feature are measuring noise, for reasons that follow directly from how overviews are assembled and regenerated.

Classification. Every cited page — 1,142 citation events across 217 queries that showed at least one overview — was hand-classified on five dimensions: site type (vendor, affiliate/review site, publisher, community/UGC, documentation, academic/reference), whether the cited page ranked in the visible top ten for that query, passage structure of the relevant section (answer-first versus buried), presence of first-party data or original testing, and content age based on visible dates. Hand-classification of a thousand-plus pages is tedious; it is also where every interesting finding came from, because none of these dimensions appear in any rank tracker export.

Honest caveats. One niche, one country, one two-week window, one pair of hands doing classification with all the bias that implies. Effects this large are unlikely to be artifacts, but the exact percentages are estimates with real error bars, not constants of nature. Treat the numbers as a map of my niche and a method for mapping yours.

Finding 1: overview coverage is heavily skewed by intent

Of 300 queries, 217 showed an AI Overview in at least one of three checks — but only 172 showed one in all three. Call the first number potential exposure (72%) and the second stable exposure (57%). The gap between them, 45 queries flickering in and out, is its own finding: at any given moment, roughly a fifth of the overview surface in this niche is being re-decided.

The intent split was stark. Informational queries showed overviews at 86% (stable: 74%). How-to queries: 78% (stable: 63%). Commercial-investigation queries: 49% (stable: 31%) — by far the most volatile band, consistent with Google still experimenting with how aggressively to summarize commercial intent. If your content plan treats "we lose clicks to AI Overviews" as one undifferentiated problem, this distribution says otherwise: in this niche the feature has effectively annexed the informational layer, holds half the commercial layer on unstable terms, and leaves the bottom of the funnel mostly alone.

Overview presence across the 300-query sample, by intent. The informational layer is effectively annexed; commercial queries are contested and volatile — which makes them the band where new entrants win citations fastest.

Finding 2: rankings are the entry ticket, not the selection

Now the question that started the argument. Across 1,142 citation events, 74% went to pages ranking in the organic top ten for that query in the same check. Another 11% ranked eleven to twenty. So my colleague was three-quarters right: the citation pool is overwhelmingly the ranking pool, confirming for our niche what larger industry studies have found elsewhere.

But "the top three with extra steps" failed badly as a model. Position one was cited in only 61% of the overviews where it appeared in the results — meaning the top result was passed over four times in ten. Meanwhile, positions four through ten supplied 38% of all citations, and 15% of citation events went to pages outside the visible top twenty entirely — almost always pages ranking for an adjacent sub-question rather than the head query, the query fan-out mechanism leaving fingerprints all over the data. The picture that fits the numbers: ranking determines who is in the room, and something else decides who speaks. The rest of the study is about that something else.

Finding 3: passage structure is the strongest controllable signal

For every query, I compared cited pages against a control group: pages ranking in the top ten for the same query that were not cited. This pairing is what makes the study useful — anyone can describe winners; the information is in what separates winners from losers who had the same ranking opportunity.

The separation on passage structure was the largest in the dataset. Among cited pages, 68% had the relevant section structured answer-first — a scoped heading with a direct, self-contained answer in the first two sentences. Among ranked-but-uncited controls, 26% did. Same rooms, same casting call, radically different outcomes, and the single most visible difference was whether the answer sat at the top of its section or had to be excavated from it. Within the cited group, answer-first pages were also more likely to be cited in all three checks rather than rotating out — structure seems to buy persistence, not just entry.

I want to be careful about causality: this is observational, and answer-first structure correlates with generally disciplined editorial operations. But the effect held within site types — answer-first vendor pages beat buried vendor pages, answer-first publisher pages beat buried publisher pages — which is hard to explain away as a site-quality confound. And it matches the mechanism: a synthesis system citing claim-by-claim needs passages it can lift cleanly, exactly as the complete guide to ranking in AI Overviews argues from first principles. My data says the first principles are operating in the wild.

Finding 4: first-party data punches far above its rank

The finding that changed my own roadmap. Pages containing first-party data — original studies, test results, benchmark numbers, real usage statistics — made up just 9% of all ranked pages in the sample, but collected 23% of citations. Cited at roughly two and a half times their availability, and the effect concentrated in the most valuable place: on commercial-investigation queries, where overviews must say something evaluative, data-bearing pages took 31% of citations. Even more telling, of the citation events that went to pages ranking outside the top ten, nearly half went to pages with original data — it was the most common way a lower-ranked page jumped the queue.

The mechanism is intuitive once you watch how overview sentences are built. Generic claims are interchangeable, so the model attributes them to the most authoritative source available — the incumbent wins by default. A specific number is not interchangeable: if the overview wants to say that reminder texts cut no-shows by a measured percentage, only the page that measured it can support that sentence. Original data converts a page from "one of several adequate sources" into "the only possible citation for this claim." That is a structural advantage no amount of on-page polish replicates, and it is why the cheapest citation strategy in my niche appears to be running small real studies — exactly like this one — rather than producing another well-formatted summary of consensus.

Comparison graphic showing citation rates versus availability in a 300-query AI Overview study: pages with first-party data were 9 percent of ranked pages but earned 23 percent of citations, while answer-first passage structure appeared in 68 percent of cited pages versus 26 percent of ranked but uncited pages — The two signals that separated cited pages from ranked-but-uncited controls: answer-first passage structure (68% vs 26%) and first-party data (cited at 2.5× its availability in the pool).

Finding 5: who gets picked, by site type

Classifying citations by site type produced the league table everyone asks for, with one twist worth more than the table. Publishers and specialist blogs took 34% of citations; vendor sites 27%; community and UGC platforms 17%; affiliate and review sites 12%; documentation and reference 10%. No single type dominates, which itself contradicts the "Google only cites big publishers" folklore.

The twist is where each type won. Vendor content was cited almost exclusively on definitional and how-to queries — practical explainers and documentation — and almost never on "best" and "vs" queries, even when it ranked there. Community threads showed the inverse pattern, surfacing heavily on evaluative and troubleshooting queries, presumably as a proxy for unvarnished experience. Publishers and data-bearing specialist blogs took the evaluative middle. The strategic readout for a vendor is uncomfortable and useful: your own domain can win the educational layer, but on the queries where buyers compare options, the overview will be assembled from third-party voices — so the work there is earning presence in other people's trusted content, the same off-site reality we mapped in how to get cited by ChatGPT, Gemini and Perplexity.

Finding 6: freshness matters at the margins, brutally at the edges

Content age showed a threshold pattern rather than a smooth gradient. Pages updated within the last twelve months took 71% of citations despite being roughly half the ranked pool — a meaningful but moderate skew. The brutal part was at the old end: pages older than two years collected just 6% of citations even though they were over a fifth of the ranked pool, and most of those exceptions were reference-style content where age plausibly reads as stability. In a niche that moves like ours, a ranked page that has not been touched in two years is, for overview purposes, barely in the game. The maintenance backlog you have been deferring is not cosmetic; in this data it is the difference between holding citations and donating your rankings to the synthesis layer for nothing.

Finding 7: what didn't matter (the null results)

A study that only reports positive findings is advertising, so here are the dimensions I tracked that showed little or no separation between cited pages and ranked-but-uncited controls — several of which contradict advice being sold confidently right now.

Word count showed no meaningful relationship with citation. Cited pages ranged from 600-word glossary entries to 6,000-word guides, and within the top-ten control comparison, longer pages were not cited more often. What mattered was whether the relevant section was clean, not how much surrounded it. The "make it longer to win AI citations" advice found no support in this data — depth helps you rank for more sub-questions, but at the passage level the model takes the cleanest answer, not the heaviest page.

FAQ schema showed almost no independent effect. Pages with FAQPage markup were cited at nearly the same rate as structurally similar pages without it (the small gap vanished once I controlled for answer-first writing, which FAQ-formatted pages have by construction). The formatting discipline that schema tends to accompany is what carries the weight; the markup alone bought nothing visible. The same was true of HowTo markup on the task queries. Schema remains worth doing for entity clarity and rich results — just not as a citation lever, consistent with the myth-busting in our schema markup myths piece.

Domain size was a weaker signal than expected. Big-brand domains were cited often, but in proportion to how often they ranked — once in the top ten, the small specialist site with a clean passage beat the large generalist with a buried one regularly enough that I stopped treating domain prestige as an excuse. The casting call is biased; the selection inside the room looked surprisingly merit-based.

Exact-match keyword phrasing in headings didn't separate winners from losers. Headings that clearly scoped the question helped; headings that parroted the query string verbatim did no better than natural phrasings of the same question. Whatever is doing the matching reads meaning, not strings — no surprise in 2026, but a relief to anyone tired of writing robotic H2s.

Anomalies worth flagging

Three oddities in the data resist tidy explanation and deserve honest mention. First, eleven queries showed overviews citing a page that no longer contained the supporting claim — content had been updated since indexing, leaving the citation pointing at a page that no longer says the thing. A reminder that the synthesis layer runs on the index's memory, not the live web, and that aggressive content pruning can orphan your own citations for weeks. Second, on six commercial queries, the overview's citation set was almost completely disjoint from the visible top ten — five or more citations from outside it — and these were uniformly queries where the top ten was wall-to-wall affiliate listicles. It is hard not to read that as the synthesis layer routing around a low-trust SERP, retrieving from the fan-out instead; if real, that is a quiet escape hatch for good content trapped under entrenched affiliate results. Third, community threads were sometimes cited for claims that the surrounding thread actively disputed — the model lifting a confident top comment without the correction below it. Whatever weighting favors UGC for authenticity is not yet reading the whole room, which has obvious implications for anyone whose category is being defined by a three-year-old forum thread.

I log these not because three hundred queries can settle them, but because anomalies are where the next quarter's hypotheses come from — and because pretending the data was cleaner than it was is exactly the habit this study exists to replace. If the second anomaly replicates at scale, in particular, it would mean the most spam-damaged SERPs are the ones where the citation game is most open — worth a dedicated follow-up sample of nothing but listicle-dominated commercial queries.

One more pattern sat on the boundary between finding and anomaly: queries where our niche's terminology is contested — two camps using different names for the same technique — produced the least stable overviews in the whole sample, flickering between framings depending on which check caught them. Naming fights, it turns out, are retrieval fights. The practical move is unglamorous but clear: cover both terms explicitly on the same page, define the relationship between them in one extractable sentence, and let the overview resolve the ambiguity through you rather than around you.

What I changed because of this study

Findings only matter if they reorder a roadmap, so here is what this one reordered. First, we re-prioritized passage rewrites over new content for one full quarter: every page ranking top ten on an overview-bearing query without a citation — 41 pages — gets its relevant sections restructured answer-first, because finding 3 says that is the cheapest available win. Second, we budgeted for two small original studies per quarter, because finding 4 says data is the only durable queue-jump, and the marginal cost of collecting it is lower than everyone assumes — this article cost three weeks of part-time logging. Third, we split our reporting by intent band, because finding 1 says the funnel layers are living under different regimes, and an aggregate CTR number averages a stable bottom with an annexed top into meaningless mush — the measurement trap we warned about in zero-click search doesn't mean zero value. Fourth, for commercial queries, we moved budget from our own "best X" pages toward earning presence in the third-party content that actually gets cited there, because finding 5 says that is where those citations live.

And fifth — the meta-lesson — we put the study itself on a schedule. The flicker rate in finding 1 means this map expires; the same 300 queries get re-run quarterly, which costs a fraction of the original effort now that the classification rubric exists.

Run this in your own niche

The most useful thing you can take from this article is not my percentages — it is the protocol, which transfers to any niche for the cost of some patience: 300 stratified queries, three checks across two weeks, log citations and the top ten, classify on the five dimensions, compare cited pages against ranked-but-uncited controls. Every strategic answer I needed fell out of that comparison. Yours will differ in the details — a health niche will see different eligibility thresholds, an e-commerce niche different site-type splits — and that difference is exactly why borrowed statistics were failing us in the first place.

The honest barrier is tedium, not difficulty — three weeks of logging and hand-classification that most teams will never get around to. That grind is the part worth handing to software: Orova can run the collection loop continuously — tracking overview presence, logging citations, flagging which of your ranked pages sit uncited — so the quarterly re-run happens whether or not anyone remembers. However you run it, get your own numbers. In a feature this young and this volatile, the team with a current map of its own niche is making decisions; everyone else is repeating statistics from someone else's.

I Analysed 300 AI Overviews in My Niche — Here's Who Gets Picked

Methodology: what I actually did

Finding 1: overview coverage is heavily skewed by intent

Finding 2: rankings are the entry ticket, not the selection

Finding 3: passage structure is the strongest controllable signal

Finding 4: first-party data punches far above its rank

Finding 5: who gets picked, by site type

Finding 6: freshness matters at the margins, brutally at the edges

Finding 7: what didn't matter (the null results)

Anomalies worth flagging

What I changed because of this study

Run this in your own niche

Let an AI Agent handle your SEO

Methodology: what I actually did

Finding 1: overview coverage is heavily skewed by intent

Finding 2: rankings are the entry ticket, not the selection

Finding 3: passage structure is the strongest controllable signal

Finding 4: first-party data punches far above its rank

Finding 5: who gets picked, by site type

Finding 6: freshness matters at the margins, brutally at the edges

Finding 7: what didn't matter (the null results)

Anomalies worth flagging

What I changed because of this study

Run this in your own niche

Let an AI Agent handle your SEO

Related articles