Everything We Published About AI Search, Distilled to One Page
Over the past several weeks we have published more than thirty articles about AI search — how Google assembles AI Overviews, what GEO actually is, who ChatGPT and Perplexity cite and why, what the AI crawlers want, how to measure traffic from engines that barely send any, and how to write content that survives being chopped into quoted passages. Individually, each piece goes deep on one question. Collectively, they have a problem we would criticise on anyone else's site: the knowledge is spread across thirty URLs, and nobody who needs all of it has time to read all of it.
This article fixes that. It is the entire series distilled to a single page: every major claim we made, compressed to its operative sentence, organised into the order you would actually use, with a link to the full article wherever you need the evidence and the implementation detail. If you read nothing else we have written about AI search, read this. If you have read everything, this is the revision sheet — and the editorial confession of the handful of places where our thinking sharpened between the first article and the last.
The one-page version of AI search: engines retrieve before they generate, so crawlability and rankings still gate everything; citations are won by extractable, answer-shaped passages from sources with verifiable expertise; visibility now spans rankings, citations, and brand mentions; and you must measure all three, because clicks alone no longer reflect what content earns.
How the answers get made: retrieval first, generation second
Everything in the series rests on one mechanical fact, so it comes first. AI search engines do not answer from a model's memory; they retrieve documents from an index — Google's own for AI Overviews and AI Mode, a Bing-backed index for ChatGPT Search, Perplexity's own crawl for Perplexity — and then synthesise an answer grounded in what they fetched. We walked through the pipeline stage by stage in how Google builds an AI Overview, and the distilled consequences are three. One: if you are not retrievable, you cannot be cited — no exceptions, no workaround. Two: retrieval is ranking-shaped, so classic SEO strength feeds AI visibility rather than competing with it. Three: synthesis selects passages, not pages, which is why formatting for extraction matters in a way it never did when a human was going to read the whole tab.
The companion claim, defended at length in the pillar complete guide to ranking in AI Overviews, is that citation selection is biased toward passages that are self-contained, directly responsive to the query, clearly attributed, and consistent with what other credible sources say. Engines are graded on factuality, so they hedge by quoting sources that look checkable. Make your key claims checkable and you have done half the work.
GEO: a real discipline, a quarter the size of its hype
The series gave the strategy layer its own pillar in the GEO playbook, and its distilled position is deliberately deflationary: Generative Engine Optimization is real, useful, and roughly seventy percent classic SEO under a new name. The genuinely new thirty percent — answer-first formatting, citation tracking, crawler policy, entity hygiene — is worth doing properly. The rebranded seventy percent is worth refusing to pay twice for, and our GEO-versus-SEO comparison draws that line item by item, so you can tell a real adjustment from a rename at contract-review speed.
The practical entry point is the audit. Before optimising anything, run the twelve checks in the GEO audit — retrievability, extractability, attribution, consistency — against your top pages and against whoever currently owns the citations you want. The audit's distilled lesson from running it repeatedly: the gap is almost never intelligence or budget; it is that the incumbent's pages answer the question in the first hundred words and yours answer it in the last.
Earning citations, engine by engine
Three articles took the engines one at a time, and their findings compress well. From how to become the source ChatGPT quotes: ChatGPT Search runs on its own crawler plus Bing-backed retrieval, so Bing indexation — the thing your team has ignored for a decade — is suddenly load-bearing, and the passages it lifts are disproportionately definitional and procedural. Write the cleanest definition of your core concepts anywhere on the web and you become quotable by default. From our Perplexity citation analysis: Perplexity is the most citation-dense engine and the most willing to link smaller domains — it favours answer-shaped pages over big brands more than any other engine, which makes it the rational first target for sites still building authority. And the strategic frame for all of it is in citations are the new rankings: presence-in-answer is replacing position-on-page as the unit of visibility, the selection criteria overlap with ranking but are not identical, and the funnel math changes because a citation can deliver brand impression value with or without the click.
One more finding deserves its sentence, because it surprised us: when we put our own product through five engines in the brand-visibility experiment, the answers were assembled largely from third-party descriptions of us, not from our own site. The distilled lesson: engines triangulate. What the rest of the web says about you bounds what AI search can say about you, which quietly re-elevates PR, reviews, and community presence into technical SEO's neighbourhood.
Crawlers, llms.txt, and the access decision
The infrastructure thread of the series reduces to one decision and two facts. The decision, argued in blocking AI crawlers is a business decision, not a reflex: whether GPTBot, ClaudeBot, PerplexityBot, and Google-Extended may access your site should be decided by comparing what their engines send you (citations, referrals, brand presence) against what their access costs you (content leverage, server load, principle) — per bot, in writing, revisited yearly. For most businesses whose model is being found, allowing retrieval bots wins that comparison easily; blocking them is choosing invisibility in the fastest-growing answer surfaces. The supporting facts: our ninety-day open-door measurement in we opened our site to every AI bot found crawl load trivial and referral quality high — fewer sessions than Google, but sessions that read more and converted at materially higher rates, consistent with users arriving pre-qualified by an answer. And on llms.txt, the distilled verdict from our llms.txt explainer stands: it costs an hour, no major engine has committed to honouring it, file it under cheap options rather than strategies.
Trust, authors, and entities: the selection filter
If retrieval is the gate, trust is the filter, and the series kept arriving at it from different directions. The core claim of E-E-A-T in the AI era: engines that stake their own brand on synthesised answers source more conservatively than ranked lists ever did, so anonymous content — whatever its quality — is structurally discounted, and the named, credentialed, independently-verifiable author has become a technical requirement wearing a human face. The infrastructure behind that face is entity SEO, distilled in teaching machines who you are: consistent naming, Organization and Person schema, sameAs links, third-party corroboration — the difference between machines knowing facts about you and machines guessing from word co-occurrence. One sentence of self-critique belongs here: early in the series we treated trust as one factor among many; by the citation analyses near the end, it looked more like the tiebreaker that decides every close call. If we re-ordered the series today, trust would come second, not fifth.
Measurement: making the invisible returns visible
The measurement thread answers the objection every CFO raises — "you cannot manage what you cannot measure" — and its distillation is a four-line dashboard. Classic organic stays. AI referrals get isolated in GA4 with a custom channel group catching chatgpt.com, perplexity.ai, copilot and friends, per the setup in measuring AI search traffic in GA4. Citation share gets tracked with a fixed monthly query basket across the major engines — a spreadsheet suffices, as shown in the dashboard we built for AI referrals. Brand demand — branded impressions and direct traffic — closes the loop as the downstream echo of answers that never sent a click. And the single most useful diagnostic in the whole series, from the impressions-up-clicks-flat explainer: that exact GSC pattern is not decay; it is the statistical fingerprint of appearing inside AI Overviews, and reading it correctly will save your program from being cancelled by its own analytics.
The content format that wins, and the operation that sustains it
The writing thread converges on one format and one habit. The format, specified in answer-first content: a direct 40-to-60-word answer at the top, question-shaped headings, each section front-loaded with its conclusion, tables for tabular things, attribution everywhere — pages built so any single passage survives being quoted alone. The proof it works is the experiment in we rewrote 20 posts and 11 got cited — same information, same URLs, new shape, citations within two months. The keyword layer feeding that format has shifted too: conversational engines absorb the long tail, so the questions worth answering increasingly come from your support inbox and sales calls rather than volume tools, the argument of long-tail moved into the chat box.
The habit is the monthly loop: run the citation basket, read the four-line dashboard, pick the two or three pages where a rewrite or trust upgrade would most plausibly move share, ship, repeat. Everything else in the series is setup; the loop is the program. Its workload is real and repetitive, which is why the operations thread of the series — and the economics of delegating the repetitive parts to agents in from dashboards to decisions — exists at all.
How to keep this page useful after it ages
A distillation is a snapshot, and the engines underneath it move quarterly. So before the role-based reading paths, one short section on which parts of this page to treat as load-bearing and which to re-verify before quoting in a meeting six months from now. Re-verify anything that names a specific engine behaviour: how many sources an AI Overview cites, which index ChatGPT retrieves from, how generously Perplexity links out — these are product decisions owned by companies that change them without notice, and every one of them has already changed at least once since we started writing. Treat as durable, by contrast, anything stated as a selection pressure: engines will prefer checkable sources, extraction will favour self-contained passages, retrieval will gate citation, third-party language will bound what engines say about you. Those are not observations about current products; they are consequences of how grounded answer systems are built and evaluated, and they have been stable across every product iteration the series witnessed. When in doubt, apply the test from our fundamentals critique: if a claim depends on one engine behaving one way, date-stamp it; if it depends on what answer systems are selected to do, build on it.
Four reading paths, by role
Different readers need different slices of the series, so here are the paths we would assign if you told us your job title.
If you lead marketing and own the budget, read three things: the master framework for the strategic shape, the citations-are-the-new-rankings argument for why the KPI conversation must change, and the impressions-up-clicks-flat explainer so the next Search Console screenshot in a board deck gets read correctly. Your decisions are allocation and patience; those three pieces are the case for both. Everything else can be delegated.
If you run content, your spine is the answer-first format guide, the rewrite experiment that validates it, and the long-tail-in-the-chat-box piece that changes where your topic ideas come from. Add the ChatGPT and Perplexity citation analyses when you start caring which engine quotes you — their preferences differ enough to shape individual briefs. Your operational change is the brief template: question, short answer, cluster, links, author, every time.
If you are the technical person, the crawler-policy argument and the ninety-day bot measurement are your decision inputs; the GA4 setup guide is your build ticket; the entity SEO piece is your schema backlog. The recurring theme you will recognise immediately: almost every AI-era technical task is a classic task with a wider audience of machines, which means your existing toolchain mostly survives.
If you are a solo founder doing all three jobs, read this page, then the GEO audit, then do nothing else until your top ten pages pass it. The series' honest message to small teams is that sequencing beats coverage: one afternoon a month on the citation basket and one rewrite a week outruns any amount of scattered enthusiasm.
Where our thinking changed along the way
A distillation that pretends the series was written with perfect foresight would be lying by omission, so here is the short retrospective — the claims we would now soften, and the ones we would state more strongly than we originally dared.
We would soften our early tone on engine differences. The first articles treated Google AI Overviews, ChatGPT Search, and Perplexity as three distinct optimisation targets with three playbooks. By the end of the series the evidence pointed the other way: the overlap in what they reward — retrievability, extraction-friendly structure, verifiable sourcing — is so large that engine-specific tactics earned a fraction of the return that shared fundamentals did. The per-engine pieces remain useful for the margins; the centre of the program should be engine-agnostic.
We would double down on trust as the tiebreaker. Said above, worth repeating with its evidence: in every citation analysis we ran, when two sources offered comparably useful passages, the one with the identifiable, corroborated author and entity won the citation at rates that surprised us. We started the series believing format was the biggest controllable lever. We ended it believing format gets you into the comparison and trust wins it.
We would also double down on brand effects. The brand-visibility experiment began as a curiosity piece and ended as one of the most strategically loaded findings in the series: engines describe you in words you largely did not write. The marketing disciplines that shape third-party language about your company — PR, reviews, community — now have a direct, mechanical line into search visibility. We undersold that early; it deserves a place in the core program, not the appendix.
And one prediction we made cautiously that aged quickly: we hedged on whether AI referral traffic would ever be worth measuring. Within the lifetime of the series it became the highest-engagement acquisition channel on our own property — small, but unmistakably real. Measure it from day one; the setup costs a day and the trend line is the argument your future budget will stand on.
The questions readers kept asking
Every series accumulates a shadow FAQ in comments, emails, and sales calls. These five came up often enough to answer on the record.
"Is any of this worth it if AI engines send so little traffic?" Reframe the arithmetic: the value of answer-surface presence is citations plus brand impressions plus the smaller-but-warmer clicks, judged against the cost — which, because the work overlaps classic SEO so heavily, is mostly the twenty percent era-layer, not a second program. On that honest denominator, yes, decisively, for businesses whose buyers ask questions before buying.
"Should we block the bots until the dust settles?" That is a decision, and you are allowed to make it — but make it as a decision, with the per-bot cost-benefit on paper, not as a default inherited from a security plugin. Waiting has a price the waiting team does not see: citation pools are forming now, and engines return to sources they have already used.
"Does this replace our SEO agency / team / tooling?" No — it redirects them. Every classic competency in the building remains load-bearing; what changes is the brief format, the measurement stack, and the monthly loop. The teams that struggled during the series were not under-skilled; they were running the new work as a side project instead of folding it into the existing one.
"How long until we see citations?" From our own data and the rewrite experiment: weeks-to-two-months for pages with existing rankings being reformatted, quarters for new content on new topics, longer in conservative niches. The leading indicators arrive in order — extraction-friendly pages win featured snippets first, then Overview citations, then the chat engines — so you are never flying blind if you watch the early surfaces.
"What is the single highest-leverage thing to do this week?" Run your ten most important queries through the three major engines and screenshot the answers. Half of everything in this series becomes self-evidently urgent — or self-evidently fine — the moment you see who is being quoted in the answers your customers are reading. That exercise costs thirty minutes and has reordered more roadmaps than any article we wrote.
What we would tell you to do, in order
Strip the series to a sequence and it is seven steps. One: verify retrievability — every important page fetchable by every crawler you have decided to allow, content present in plain HTML. Two: set crawler policy deliberately and write it down. Three: fix trust infrastructure — authors, bios, schema, entity corroboration. Four: rewrite your ten most-impressed informational pages answer-first; then the next twenty. Five: build the four-line dashboard before you judge any of it. Six: run the monthly loop. Seven: re-test your query basket quarterly and reallocate. Readers wanting the fully-sequenced version with exit criteria per quarter have it in the twelve-month roadmap, and the strategic architecture holding all seven steps together is the master framework published alongside this distillation.
And the honest closing note, because a distillation should also distil the uncertainty: the engines will keep changing — link counts per answer, crawler names, which surfaces send traffic. Every specific number in the series has a shelf life. The structure does not. Retrieval before generation, extraction over reading, trust as the filter, visibility across three surfaces, measurement before judgment — those held through every update the series lived through, and they are the parts we are confident handing you on one page. The thirty articles behind the links carry the evidence; this page carries the argument. Keeping the loop running underneath it — the audits, the baskets, the refresh queue — is the part Orova automates, so the one-page version of your job stays the interesting part: deciding what your site should be the answer to.
Let an AI Agent handle your SEO
Orova plans, writes, optimizes, and tracks rankings on its own — you just read the results.
Try it free