How Google Builds an AI Overview, Stage by Stage

Most advice about AI Overviews treats the feature as a black box with a slot on top: put good content in, hope a citation comes out. That framing leaves you optimizing by superstition. The better approach is the one engineers use on any system they need to influence — take it apart, name the stages, and work out what each stage rewards. Google has not published a schematic, but between its official descriptions, its patent filings, the behavior visible in millions of live results, and what its spokespeople have confirmed in interviews and documentation, the assembly line of an AI Overview can be reconstructed with reasonable confidence.

This article is that reconstruction. We will follow a single query through the machine — from the moment Google decides it deserves a generated answer, through query expansion, retrieval, synthesis, and citation — and at every stage ask the only question that matters for your strategy: what does this step select for, and where does your content win or lose? By the end, the practical guidance you have read elsewhere should stop being a list of tips and start being a set of obvious consequences.

How does Google build an AI Overview? The system first classifies whether a query benefits from a generated answer, then expands it into related sub-questions, retrieves candidate passages from pages in the Search index, and has a Gemini model write a grounded summary — attaching citations to the sources whose passages support each claim.

Stage zero: the eligibility decision

Before anything is generated, a classifier decides whether this query gets an AI Overview at all. This is the stage almost everyone ignores, and it shapes everything downstream, because no optimization matters on a query where the feature never fires.

The decision appears to weigh several things. Query intent is the dominant factor: informational and question-shaped queries — explanations, definitions, how-tos, comparisons, troubleshooting — trigger overviews at high rates, while navigational queries almost never do, and transactional queries sit in between and keep shifting as Google experiments. Risk is the second factor: Google applies visibly higher thresholds in areas where a wrong synthesized answer is costly — health, finance, legal, safety — where overviews either do not appear or appear with more conservative, heavily attributed phrasing. Confidence is the third: when retrieval cannot assemble enough consistent material to ground an answer, the system tends to fall back to the classic results page rather than generate something shaky. The practical reading: overview eligibility is a property of the query, not of your page, so your first analytical task on any keyword list is simply to record which queries trigger the feature today. That distribution — across your topics, not the internet at large — is your real battlefield map, and it drifts over time as Google retunes the classifier, which is why the measurement habit matters more than any snapshot.

What the eligibility boundary looks like in practice

Abstract categories become useful when you see them applied. Take a software company's keyword list and sort it through the classifier's apparent logic. "What is call tracking" — definitional, low risk, high overview rate. "Call tracking vs call recording" — comparison intent, high overview rate, and a commercially important one because the overview effectively writes the shortlist. "Best call tracking software" — commercial investigation, overview behavior unstable, varying by month and market as Google experiments with how aggressively to summarize commercial queries. "Acme call tracking login" — navigational, effectively never. "Call tracking pricing" — transactional-leaning, mostly classic results today, but in the band Google keeps testing. Run this sort across two hundred of your queries and you typically find overviews concentrated on a minority of them — but that minority skews heavily toward the early-funnel questions that feed everything downstream, which is why the feature's impact on pipeline is larger than its raw query coverage suggests.

Stage one: query fan-out

Once a query is eligible, the system does something that quietly breaks fifteen years of SEO intuition: it stops treating the query as a single string. The user's question is expanded — fanned out — into a set of related questions that together cover what someone asking this probably wants to know. Google has described this technique explicitly in the context of its AI search features, and you can observe its fingerprints directly: overviews routinely contain sub-answers to questions the user never typed, and the cited sources frequently rank not for the head query but for one of its implied siblings.

Consider what fan-out does to the economics of ranking. Under classic search, the page that won position one for the exact string captured the value. Under fan-out, a single overview might draw on one source for the definition, another for the step-by-step, a third for the cost question, and a fourth for the edge case — four winners where there used to be one, selected across perhaps a dozen expanded queries you never see. This is the mechanism behind a finding that surprises people in citation studies: a meaningful share of cited pages do not rank on page one for the visible query. They rank for a sub-question. The fan-out set is the real keyword universe, and pages built around coherent question clusters — the architecture we have long advocated in topic clusters beat standalone posts — happen to be shaped exactly right for it. A cluster does not just build topical authority in the abstract; it gives retrieval a matching document for more of the expanded set.

There is a subtler implication too. Fan-out is generated by a model, which means it reflects what the model believes people want to know about a topic. Content that anticipates adjacent questions — the "but what about" and "how much does it cost" and "when does this fail" questions — is effectively predicting the fan-out, and every correct prediction is another retrieval lottery ticket. This is why thin pages optimized for one phrasing have decayed so visibly since overviews arrived: they hold exactly one ticket.

The assembly line of an AI Overview: an eligibility classifier, query fan-out, passage retrieval, grounded synthesis, and citation attachment. Each stage selects for something different — and each is a separate place your content can win or lose.

Stage two: retrieval, and why rankings still feed the machine

For every query in the fan-out set, a retrieval system pulls candidates from the index. Two facts about this stage are well supported and strategically decisive.

First, retrieval draws on Google's existing Search infrastructure, not a separate AI index. Google's own documentation states that AI Overviews are backed by core Search systems, and the empirical record agrees: independent analyses have repeatedly found that most overview citations come from pages ranking in the top ten organic results for the query or a closely related one. The correlation is not perfect — and the exceptions are interesting, as we will see — but the center of gravity is unambiguous. Whatever earns ordinary rankings — relevance, authority, links, quality signals — is therefore still the engine that fills the candidate pool. There is no path into the synthesis stage for content the ranking systems do not already respect, which is the single most important sanity check against treating "AI SEO" as a separate discipline.

Second, retrieval operates at the passage level, not just the page level. Google has been ranking passages since well before generative search, and overview synthesis sharpens the stakes: the unit handed to the model is a chunk of text — a section, a paragraph group — judged relevant to one of the fanned-out questions. A ten-thousand-word page is, from this stage's perspective, a bundle of competing passages, each rising or falling on its own clarity. This explains an otherwise puzzling phenomenon: enormous, authoritative pages losing citations to modest pages whose single relevant section is cleaner. The big page's answer exists, but it is diluted — spread across prose written for linear reading, anchored by pronouns and callbacks that make any one chunk incomplete on its own. Passage-level retrieval rewards sections that stand alone: scoped by a descriptive heading, front-loaded with the answer, free of references to "the above." If you want a single editing rule derived from this stage, it is that every section should survive being read in isolation.

Stage three: grounded synthesis

The retrieved passages go to a model — a version of Gemini customized for Search — with a job description that matters enormously for content strategy: write an answer grounded in these sources. Grounding means the model is constrained to base its claims on the retrieved material rather than free-associating from training data, and it is the reason overviews cite anything at all. The model drafts the answer and the system checks which retrieved passages support which generated claims, a verification process Google has described as corroboration between the response and supporting results.

Think about what the synthesis step selects for among the candidates it was handed. It needs passages it can paraphrase without distortion — so ambiguous, hedged, or internally contradictory text is unusable even when relevant. It needs claims that corroborate or usefully extend what other sources say — so a passage that conflicts with the consensus needs visible evidence behind it to survive, while a passage that merely repeats the consensus adds nothing the model does not already have from a higher-ranked source. And it needs specificity, because the overview's sentences are concrete: numbers, conditions, steps, named mechanisms. Generic text gives the model nothing to attribute.

This stage is where originality converts to citations. When five retrieved passages say approximately the same thing, the model needs one of them — generally the most authoritative or cleanest — and the other four were retrieved in vain. A passage containing something the others lack — first-party data, a precise threshold, an explicit edge case — earns its own sentence in the answer, and that sentence needs a citation. Redundant content fails here even after succeeding at ranking and retrieval, which is the precise mechanical reason "information gain" stopped being a nice-to-have and became the difference between feeding the machine and being fed to it.

Stage four: citation attachment, and what gets displayed

Citations are attached where generated claims map to supporting passages, then surfaced in the interface — inline links on spans of text, and a panel of source cards. The mapping is claim-level, which produces a fact with real strategic weight: a source is cited for the specific thing it supported, not for general excellence. Your page does not get cited for being good; it gets cited because sentence three of the overview came from your section on reminder timing.

The display layer adds its own selection effects. Overviews show a limited set of sources prominently, with the rest collapsed behind interaction, so citation position within the overview varies in value the same way ranking position always did. Click-through from citations is real but modest — the searcher already has the synthesized answer — which reframes what a citation is worth: partly a click channel, partly brand impression at the moment of need, a trade-off we quantified in our analysis of AI Overviews and click loss. And because the overview is regenerated rather than fixed, citations rotate: the same query can cite different sources across sessions, locations, and weeks, especially outside the stable head terms. Treating any single observation as your "AI ranking" is sampling error; presence rates over repeated checks are the honest metric.

Two-column comparison mapping each stage of the AI Overview pipeline to the content property it selects for: eligibility maps to query intent, fan-out to question coverage, retrieval to ranking strength and standalone passages, synthesis to specific original claims, and citation to claim-level support — Each pipeline stage selects for a different content property. Read in reverse, it is a diagnostic: no citation despite ranking points at synthesis; no retrieval despite coverage points at passages; nothing anywhere points at eligibility or rankings.

The volatile layer: regeneration, freshness, and context

One more property of the machine deserves its own section, because it wrecks more analyses than any other: an AI Overview is not a stored object. It is regenerated — or assembled from cached components — under conditions that vary, and the variance has identifiable sources worth naming.

Location and language shift the retrieval pool: the same English query asked from different countries draws on partially different candidate sets, and overview presence itself differs by market because the eligibility classifier is tuned per locale. Time shifts everything: Google retunes the classifier (entire query categories gain or lose overviews overnight), refreshes the index (a newly updated competitor page enters the pool), and updates the model (synthesis style and citation counts visibly change across quarters). Session context can shift phrasing and emphasis. And ordinary nondeterminism in generation means two identical requests minutes apart can produce differently worded answers drawing on overlapping but non-identical citation sets, particularly in the long tail where no source is dominant.

For measurement, the discipline this imposes is statistical rather than anecdotal: sample each priority query repeatedly across days before concluding anything, track presence rates and citation share rather than binary "we are in / we are out," and never let a screenshot — yours or a competitor's — drive a strategy decision. For strategy, the volatility is asymmetric good news. Stable head-term overviews are hard to break into because a dominant source keeps winning the regeneration lottery, but the rotating long tail re-runs the contest constantly, which means every passage improvement you ship gets re-evaluated within weeks, not ranking cycles. Fast feedback is the most underrated gift this system gives you: it makes passage optimization an empirical practice, where a hypothesis about why a section is losing can be tested and confirmed inside a month.

Reading the pipeline backwards: a diagnostic method

The reason to internalize the stages is not intellectual satisfaction — it is that the pipeline, read backwards, becomes a diagnostic for any query where you are losing. Work through it in reverse order and the failure localizes itself.

Start at the end. Does the query trigger an overview at all? If not, this is an eligibility question, and your strategy is classic SEO plus monitoring, because the classifier's boundary moves. Does the overview cite competitors who rank near you? Then retrieval found the topic and the problem is your passages: check whether your relevant section answers its question in the first two sentences, stands alone without context, and contains anything specific enough to attribute. This is the most common and cheapest failure to fix. Are the cited sources answering sub-questions you never covered? That is fan-out exposure — your cluster has holes, and the fix is mapping the related questions and filling them deliberately rather than padding the existing page. Do you not rank in the top twenty for the query or any sibling? Then nothing about AI Overviews is your problem yet; you are failing at stage two for the oldest reasons in the discipline — authority, relevance, links — and passage polish is premature. The diagnostic protects you from the most expensive mistake in current SEO: applying the right fix at the wrong stage.

It also clarifies what is genuinely new versus rebranded. Stages zero through two reward what SEO always rewarded: intent understanding, topical coverage, ranking strength. Stages three and four reward something that was previously optional: passage-level clarity and original, attributable claims. The discipline did not get replaced. It got a second half.

What this means for how you build content

Pull the stage-by-stage incentives together and a content specification falls out, almost mechanically. Build around question clusters, because fan-out retrieves across the cluster, not the keyword. Hold ranking fundamentals sacred, because retrieval draws from what already ranks. Write sections that stand alone — scoped heading, answer first, no dependence on surrounding prose — because retrieval and synthesis both operate on isolated passages. Put something attributable in every important section — a number, a tested result, a precise condition — because synthesis cites claims, not vibes. Maintain entity-level trust signals — real authors, consistent organizational identity, corroborating mentions — because grounded systems weight source credibility hard when choosing among corroborating passages, the same dynamic that governs visibility in ChatGPT, Gemini and Perplexity citations. And keep structured data clean, not because schema buys citations, but because unambiguous machine-readable identity helps every stage that has to decide what your page is and who stands behind it.

None of these practices is speculative. Each one maps to a named stage with an observable selection behavior, which is the difference between a checklist you follow on faith and one you can defend in a planning meeting — and revise intelligently when Google changes a stage, because you will know which advice was downstream of what.

The machine is legible — and that is the opportunity

It is fashionable to describe AI search as inscrutable, and at the level of any single result, the variability makes it feel that way. But systems built from classifiers, retrieval, and grounded generation have stable tendencies, and stable tendencies are what strategy is made of. The teams winning citations consistently are not guessing better; they are working the stages — auditing eligibility across their query set, mapping fan-out, fixing passages, adding attributable substance, and re-measuring — as a repeating loop rather than a one-time project. For the full tactical playbook layered on top of this machinery, see our complete guide to ranking in Google AI Overviews; for the evidence on who actually gets picked, our study of 300 live overviews puts numbers on everything this article derived from first principles.

The loop is honest work, and its bottleneck is volume: every stage of the diagnostic repeated across hundreds of queries, every passage audit repeated across every page that matters. That is the layer worth automating — it is exactly what Orova was built to run continuously, tracking which of your queries trigger overviews, flagging where the pipeline is dropping you, and drafting the passage-level fixes — so that your team's judgment is spent on the one input no system supplies: knowing something about your field that the other retrieved sources do not.

How Google Builds an AI Overview — and Where Your Content Fits In

Stage zero: the eligibility decision

What the eligibility boundary looks like in practice

Stage one: query fan-out

Stage two: retrieval, and why rankings still feed the machine

Stage three: grounded synthesis

Stage four: citation attachment, and what gets displayed

The volatile layer: regeneration, freshness, and context

Reading the pipeline backwards: a diagnostic method

What this means for how you build content

The machine is legible — and that is the opportunity

Let an AI Agent handle your SEO

Stage zero: the eligibility decision

What the eligibility boundary looks like in practice

Stage one: query fan-out

Stage two: retrieval, and why rankings still feed the machine

Stage three: grounded synthesis

Stage four: citation attachment, and what gets displayed

The volatile layer: regeneration, freshness, and context

Reading the pipeline backwards: a diagnostic method

What this means for how you build content

The machine is legible — and that is the opportunity

Let an AI Agent handle your SEO

Related articles