We Rewrote 20 Posts for AI Overviews — 11 Got Cited

Late last year I did something that felt vaguely heretical: I stopped publishing new content for ten weeks. Our blog had a respectable archive, decent rankings, and a traffic chart that looked like a slow leak — impressions climbing, clicks sagging, the unmistakable silhouette of AI Overviews settling over our keyword space. Every instinct said to write more. Instead, we picked twenty existing posts and rewrote them specifically to be cited by AI Overviews, then sat on our hands and measured what happened.

Eleven of the twenty ended up cited. Nine did not. The split turned out to be far more instructive than the headline number, because the posts that failed didn't fail randomly — they failed in patterns we could name, and the posts that succeeded shared traits we hadn't fully appreciated when we started. This is the complete account: how we picked the twenty, exactly what we changed, how we measured citations without any official report to lean on, and what I would do differently. Treat it as one team's n=20, not a law of nature. But if you are staring at the same sagging click curve we were, I suspect you will recognise a lot of this.

We rewrote 20 existing blog posts using an answer-first format — question headings, 40–60 word direct answers, refreshed data, and self-contained passages — and within ten weeks 11 of them appeared as cited sources in Google AI Overviews for queries we tracked. The strongest predictors of success were existing page-one rankings and genuinely committed answers.

Why we rewrote instead of writing new

The decision came from a simple piece of arithmetic. Our Search Console data showed about forty posts with rising impressions and flat-to-declining clicks on question-type queries — the classic signature of content that Google considers relevant enough to surface but that AI Overviews were answering on our behalf, a pattern we had been watching since reading zero-click search doesn't mean zero value. Those forty posts already had the hard-won assets: age, links, topical fit, page-one or page-two rankings. What they lacked, we suspected, was extractability — clean passages an AI system could lift. Writing new posts meant starting from zero on authority to fix a problem that was fundamentally about format. Rewriting meant attacking the format problem directly on pages that had already cleared the authority bar.

There was also a sample-size argument. With new content, you can't separate "the rewrite worked" from "the post eventually ranked." With existing posts that had stable rankings for six-plus months, any change in citation behaviour within weeks of a rewrite is at least plausibly attributable to the rewrite. Not rigorous causality — we ran no control group the first time, a mistake I'll return to — but a much cleaner signal than greenfield publishing.

How we picked the twenty

We scored the forty candidates on three things and took the top twenty. First, query overlap with active AI Overviews: we manually checked whether the post's main queries actually triggered an AI Overview. Around a quarter of our suspects didn't — their click decline had other causes — and rewriting those for AI citation would have been pointless. Second, existing rank: we favoured posts ranking in the top ten for at least one target query, on the logic (borne out later) that AI Overviews draw heavily from sources Google already trusts for the topic. Third, fixability: a quick editorial judgment on whether the post's problem was structural — buried answers, vague headings, stale numbers — rather than the content simply being thin. Three posts got cut at this stage for being unsalvageable without a full rewrite from scratch, which would have polluted the experiment.

The final twenty broke down roughly into eight how-to guides, six definitional explainers, four comparison posts, and two opinion-ish pieces we included half as controls because we doubted opinion content gets cited much. (Spoiler: we were right to doubt.)

What the rewrite actually involved

We wrote ourselves a checklist and applied it identically to all twenty posts, mostly following the format we later documented in full in our guide to answer-first content. Per post, the rewrite took between three and five hours — real but not heroic effort. The changes:

Question-phrased headings. Every H2 that answered something got rewritten to contain the actual question, phrased the way queries phrase it. "Pricing considerations" became "How much should you budget for X?" We pulled the phrasings from Search Console queries and People Also Ask rather than inventing them.
A 40–60 word direct answer under every question heading. This was the heart of the work. Each answer had to stand alone with no backward references, commit to a position, and front-load the claim. Writing these is harder than it sounds; our first drafts kept regressing to "it depends" hedges, and we rewrote some answer paragraphs four times.
A page-level answer immediately after the intro — same rules, answering the post's main question.
Data refresh. Every number, screenshot, and tool reference got re-verified or replaced. Eleven of the twenty posts contained at least one statistic that was flatly out of date. We also updated the visible "last updated" dates honestly — the content genuinely changed, substantially.
Structural conversions. Processes became numbered lists. Comparisons became tables. Definitions got the "X is a Y that does Z" first sentence. Nothing exotic — just moving information out of prose into shapes that extraction systems handle reliably.
Entity hygiene. We standardised terminology (one name per concept, used consistently), spelled out acronyms on first use per section, and replaced pronoun-heavy passages with named subjects.

Two things we deliberately did not change: URLs (no slug edits, no redirects, nothing that would muddy the measurement) and link profiles (no new internal links beyond what the rewritten text naturally required, no link building during the window). We wanted the format variable as isolated as we could get it.

Timeline of a ten week experiment rewriting 20 blog posts for AI Overviews showing selection from 40 candidates, identical rewrite checklist applied, weekly citation tracking panel, and 11 of 20 posts cited by week ten

How we measured citations (the annoying part)

Here is the uncomfortable truth anyone running this experiment hits immediately: Google gives you no citation report. Search Console does not break out AI Overview appearances as a dimension. If you want to know whether you're cited, you have to go look, repeatedly, in a controlled way.

Our protocol: for each of the twenty posts, we defined three to five target queries (94 queries total) — the post's main keyword plus the question variants from its headings. Every Monday, from a clean browser profile, US location via the same setup each time, we ran every query and logged three things: did an AI Overview appear, was our post among the cited sources, and where in the citation list it sat. One person, the same person, about ninety minutes a week. We also logged which competitors got cited, which later proved to be the most educational data we collected.

Three measurement lessons worth stealing. First, AI Overviews are volatile — whether one appears at all for a given query fluctuated week to week for about a fifth of our panel, so single-snapshot checks are nearly meaningless; you need a time series. Second, citation does not equal the overview appearing for your main keyword — several posts got cited for question variants we considered secondary, while the head query showed no overview at all. If we had only tracked head terms we would have counted at least three successes as failures. Third, personalisation leakage is real — early on, checks from a logged-in profile showed us citations that the clean profile didn't, flattering us into a false positive week. Clean profile, every time, no exceptions. We've since written up the broader tooling options in how to actually measure AI visibility.

The results, week by week

Nothing happened for two weeks, which felt awful. The first citation appeared in week three — a how-to post, cited for a question variant, sitting fourth in the citation list. By week five we had five posts cited. By week eight, ten. The eleventh and final new citation appeared in week ten, and the picture then stabilised: through the following month, those eleven held citations on at least one panel query in most weekly checks, though individual queries flickered constantly.

The traffic effects were smaller and stranger than the citation effects. Across the eleven cited posts, clicks did not crater further — they roughly flattened, and four posts actually recovered modest click growth, which we attribute to winning citations on queries where we previously had no presence at all. Impressions rose noticeably across the whole rewritten set, including some of the non-cited nine, consistent with the rewrites simply ranking a bit better for more question variants. Average session quality on the cited posts improved — slightly longer engagement, more next-page navigation — supporting the theory that post-AI-Overview clickers are the ones whose question was bigger than the answer box. None of this transformed our business in ten weeks. The citations themselves were the point: presence in the answer layer where our category's buyers now start.

What separated the eleven from the nine

This is the section I actually wanted to write, because the failure patterns were so consistent.

Prior rank was close to destiny. Ten of the eleven cited posts ranked in the top ten for at least one target query before the rewrite. Of the nine failures, six ranked between eleven and twenty-five. The lesson aligns with everything in the complete guide to ranking in AI Overviews and with our earlier observational work in the 300-overview analysis: format makes trusted pages extractable, but it does not manufacture trust. Rewriting a page-three post for extractability is sanding a door that isn't hung yet.

Committed answers won; lukewarm answers lost. Among posts with comparable rankings, the differentiator was how much spine the answer paragraphs had. The cited posts gave numbers, named thresholds, took positions. Two of the failures, when we re-read them honestly, had answer paragraphs that obeyed the 40–60 word format while still essentially hedging — formatted indecision. The machine apparently shares human readers' lack of patience for it.

Both opinion pieces failed, exactly as predicted. AI Overviews on our panel queries were assembled from factual, instructional passages. Opinion content earns links, brand, and human loyalty — different jobs, as we explored in our analysis of the zero-click economy — but on this evidence it is not citation bait, and rewriting it as if it were wastes effort. We should have known better; understanding how Google assembles an AI Overview makes the mismatch obvious in hindsight.

Freshness mattered more than we expected. Three of the eleven winners got cited for queries where the prevailing answer involves figures that change year to year, and our refreshed numbers were simply more current than incumbents'. One failure, conversely, covered a topic where a competitor had published a major original study during our window; they took citations on essentially every related query. Against original primary data, our nicely formatted secondary content didn't compete.

Comparison chart of the 11 posts cited by AI Overviews versus the 9 not cited, showing winners had top ten rankings, committed answers with numbers, factual instructional content and fresh data, while losers ranked lower, hedged, or were opinion pieces

The numbers by content type

Breaking the eleven-of-twenty result down by format makes the pattern even starker. The how-to guides went six for eight — by far the best hit rate, and the two misses were both posts ranking outside the top ten before the rewrite. Process questions seem to be where AI Overviews lean hardest on external sources, presumably because step-by-step instructions are risky to synthesise loosely and safe to lift from a source that has them in clean numbered form. Five of our six cited how-tos had their numbered list extracted nearly verbatim into the overview at least once during the window.

The definitional explainers went four for six. The two failures here were instructive in a different way: both covered terms where an industry body or documentation site owns the canonical definition. For "what is X" queries, the overview consistently cited the source closest to being the term's official home, and no formatting on our side changed that. We were competing for a citation that, realistically, was never available. The comparison posts went one for four, which surprised me — I expected tables to be citation magnets. Watching the overviews for those queries suggested why: comparison queries often produced overviews synthesised from many sources, each contributing one attribute, and the citation slots went to sources with original benchmarks or first-party specifications rather than to third-party comparison roundups like ours. And the opinion pieces, as covered, went zero for two.

One more cut of the data: citation position. Of our eleven winners, only three ever appeared as the first-listed source on any panel query. Most lived in the middle of the citation list. We could not detect a meaningful traffic difference between citation positions at our sample size — the differences between being cited anywhere and not being cited at all dwarfed positional effects. I would not spend effort optimising for citation position until the bigger battle is won.

Questions people asked when we shared this internally

Did the rewrites hurt classic rankings? No — and this was the leadership team's first worry. Across the twenty posts, position tracking through the window showed the usual noise but no systematic decline; if anything, the median post gained slightly, consistent with the freshness pass and tighter structure helping conventional ranking too. The rewrite checklist contains nothing that trades classic SEO for citations. It is the same craft, aimed at the passage level.

How did you stop the answer paragraphs from making the posts feel robotic? Partly by keeping the introductions human — the answer block format governs sections, not the opening — and partly through an editing rule we added in week one: after writing each direct answer, the writer had to read the section aloud and check it still sounded like a person who has done the thing, not a specification sheet. Three posts needed a second pass purely for voice. Readers, for what it's worth, never complained; on-page feedback widgets actually ticked up on the rewritten set.

Would this work outside English? We only tested English queries from one market, so honestly: unknown. AI Overview coverage varies significantly by market and language, and citation patterns in smaller-language markets appear less crowded — which may mean earlier movers get cited with less authority than our experiment required. Our Vietnamese-language team is running the equivalent experiment now, and the early checks suggest the answer-first format transfers, while the competitive thresholds differ.

What I would do differently

Run a control group. Our biggest methodological hole: with no untouched comparison set, we can't fully rule out that some citations would have arrived anyway as Google expanded overview coverage in our niche. We've since started a second round with twenty rewritten and twenty matched untouched posts. Preliminary signal favours the rewrites clearly, but I publish the first experiment with that caveat attached and you should read it with the caveat in mind.

Skip the low-ranking posts. Knowing the prior-rank pattern, I'd spend those six rewrite slots either on more top-ten pages or on the slower work of improving the underlying authority of the page-two posts first — links, cluster reinforcement, real upgrades — and rewrite them for extraction afterwards. Sequence matters: authority first, format second.

Define the query panel before touching anything. We finalised two posts' query lists after rewriting had started, which is sloppy — the temptation to pick queries you think you'll win is strong, and even honest people contaminate samples. Panel first, frozen, then edits.

Budget for maintenance, not just the rewrite. Citations flicker. Two of our eleven lost their citations for a stretch when competitors updated their pages, and one we recovered only after a further freshness pass. This is not a trophy you win once; it is a position you hold. Whoever owns this in your team needs recurring hours, not a one-off project.

Track competitor citations from day one. The log of who got cited instead of us became our de facto roadmap — it told us which formats won each query type and where original data would be required to compete. If I could keep only one spreadsheet from the whole experiment, it would be that one.

Check ChatGPT Search and Perplexity in the same panel. We tracked Google exclusively, because that is where our decline was. But the rewritten posts were, by construction, optimised for any retrieval-based system, and spot checks near the end of the window found four of them being cited by Perplexity for related questions — visibility we had earned without measuring it. The marginal cost of adding two more engines to a weekly panel you are already running is twenty minutes. Take the free data; the answer engines beyond Google are smaller but growing, and their citation behaviour differs enough to teach you separate lessons.

Should you run this experiment?

If your Search Console shows the impressions-up-clicks-flat signature on question queries, and you have a bench of posts already ranking top ten for those queries — yes, and the maths is friendly. Call it four hours a post: twenty posts is two working weeks of one editor's time. We got eleven citation footholds, a measurement system, and a pile of competitive intelligence for that spend. Against the cost of producing twenty new 3,000-word articles, it is not close.

If your problem is that you don't rank yet, this is not your experiment. Format work compounds authority; it doesn't substitute for it. And if your archive is small, weigh the rewrite against the opportunity cost of new coverage — extraction formatting is cheap to apply at write time, so newer programs should simply build answer-first from the start rather than retrofitting later.

The whole exercise left me oddly optimistic. The pattern in our results was not "Google rewards tricks." It was "Google's answer layer rewards pages that answer quickly, commit to positions, and keep their facts current" — which is just a description of good content with the vagueness removed. The tedious parts — spotting which posts are bleeding clicks to AI answers, flagging stale claims, tracking the weekly citation panel — are exactly the parts we now let Orova's agent handle, which has turned a ten-week science project into a standing routine. The eleven citations were nice. The repeatable system was the actual prize.