Orova OROVA.VN Marketing AI Agent
Insights

We Opened Our Site to Every AI Bot for 90 Days

Orova 1 views
We Opened Our Site to Every AI Bot for 90 Days

At the start of this year, our robots.txt looked like most robots.txt files written since 2023: a patchwork. GPTBot was blocked — someone had added it during the first wave of AI anxiety. PerplexityBot was allowed, apparently by accident, because nobody had heard of it when the block list was pasted. Google-Extended was disallowed on the theory that this protected something, although nobody could say what. The file had seven AI-related rules, three authors, and no documented reasoning. It was, in other words, a mood — and we decided to replace it with an experiment.

On the first of March we opened the site completely. Every AI crawler, every search-index bot, every user-triggered fetcher: allowed. No exceptions, no rate limits beyond our standard CDN sanity rules, for ninety days. We logged everything — every bot hit with verified identity, every byte served, every referral from an AI surface, every signup that traced back to one. The question was simple and, we found, surprisingly unanswered anywhere we looked: what actually happens when a mid-sized B2B content site stops fighting the bots? Everyone argues about AI crawlers from principle. We wanted to argue from a server log.

This is the full readout: what crawled us and how hard, what it cost in infrastructure terms, what came back in citations and referrals, what converted, and — just as important — what we still cannot attribute and refuse to pretend we can. One site, ninety days, no control group. Treat it as a data point with caveats attached, not a law of nature. But it is a real data point, and the space has far more opinions than those.

Opening our site to all AI bots for 90 days cost almost nothing and returned a small, fast-growing stream of high-intent visitors. AI crawlers grew to roughly a quarter of bot requests with negligible bandwidth cost, AI referrals reached about 4% of organic sessions by day 90 — but converted to signups at roughly twice the rate of Google organic traffic.

Setup: what we changed and what we measured

The site in question is this one — a B2B SaaS marketing site with a blog in the low hundreds of articles, documentation, and the usual product pages. Three changes went live on day one. The robots.txt was rewritten to allow every AI-related user-agent we could enumerate: GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot and Anthropic's fetchers, PerplexityBot and Perplexity-User, CCBot, Meta-ExternalAgent, Amazonbot, Bytespider, and the long tail. Our CDN's AI-bot blocking features were switched off, with verification logging kept on so we could distinguish authentic bots from impersonators. And we tagged the analytics to separate referrals from AI surfaces — chatgpt.com, perplexity.ai, gemini.google.com, copilot.microsoft.com and variants — into their own channel.

Measurement ran in four layers. Server and CDN logs gave us crawl volume, verified bot identity, URL coverage, and bandwidth per bot. Analytics gave us referral sessions, engagement, and conversion events by source. A weekly citation audit — the same twenty questions asked of ChatGPT Search, Perplexity, and Google's AI surfaces every Monday, with answers and citations recorded — gave us a visibility series. And our signup attribution, which asks new accounts where they first heard of us alongside the click-path data, gave us the bottom of the funnel. We also froze confounders as best we could: no site migrations, no major content-format changes, and our normal publishing cadence held steady through the window.

Three caveats before any numbers, stated plainly because Research articles that skip them are advertising. First, no control group — we cannot run the same site, same quarter, bots blocked, in a parallel universe. Where we claim causation we will say why, and where we cannot we will say so. Second, one site in one niche; B2B SaaS content economics are not publisher economics, as we argued in our piece on blocking as a business decision. Third, ninety days is long enough to see crawler behavior and referral trends, and too short to see training effects — anything our content taught a foundation model will surface in model versions shipped long after this experiment closed. That question stays open by design.

What the crawlers did: volume, patterns, and one surprise

The headline infrastructure number first, because it defuses the scariest objection: over ninety days, verified AI bots made about 412,000 requests — rising from roughly 9% of total bot traffic in week one to 26% by week twelve — and served about 38 gigabytes, which on our CDN contract priced out to under fifteen dollars for the entire quarter. Median additional origin load was negligible; our pages are cached, text-heavy, and cheap. For a media-rich or dynamically rendered site the bill would read differently, but for a standard content site, the "AI bots will crush your servers" argument did not survive contact with our invoice.

The composition of that traffic told a clearer story than the volume. The training crawlers were the bulk: GPTBot alone accounted for roughly 31% of verified AI requests, ClaudeBot about 24%, CCBot arriving in slower periodic waves at about 9%. Their pattern was exactly what the category predicts — broad, shallow, archive-deep sweeps. GPTBot reached 94% of our indexable URLs within the first five weeks, including posts from years ago that no human had visited in months. After the initial sweep, both settled into maintenance crawling concentrated on new and updated pages.

The search-index crawlers were smaller and far more pointed. PerplexityBot and OAI-SearchBot together were about 18% of AI requests, but their distribution was the inverse of the trainers': heavily concentrated on fresh URLs — new posts were typically fetched within hours of publication, occasionally within minutes — and on a stable set of about thirty older pages that, we later confirmed, were the ones appearing as citations in answers. Watching which URLs the index bots kept refreshing turned out to be a leading indicator of which pages were earning answer-engine attention, a trick we have since made part of our standard monitoring.

The surprise was in the third category. User-triggered fetchers — ChatGPT-User and Perplexity-User, the agents that fetch a page live because a human asked about it — started at two or three hits a day and grew to a steady 30 to 40 by the final fortnight, an order-of-magnitude climb that tracked our citation growth almost perfectly. Each of those hits is a person, mid-conversation with an assistant, consuming our content. They landed disproportionately on comparison posts, pricing, and documentation — bottom-of-funnel pages — and they are invisible in ordinary analytics because no JavaScript fires. An entire audience was reading us through machines, and before this experiment we had been turning a fraction of them away.

Bar and line chart of 90 days of verified AI bot traffic: GPTBot 31%, ClaudeBot 24%, PerplexityBot and OAI-SearchBot 18%, CCBot 9%, others 18% of requests, with user-triggered fetcher hits growing tenfold from week 1 to week 13

What came back: citations first, then clicks

The weekly citation audit produced the cleanest trend line of the quarter. In week one, our twenty benchmark questions yielded six answers that cited us at least once across the three engines. By week six it was eleven; by week thirteen, fifteen — with Perplexity consistently the most generous citer, ChatGPT Search the most volatile week to week, and Google's AI surfaces the slowest to move, which matches the architecture: Google was already crawling us via Googlebot, so opening the other bots changed nothing on that front, and its AI citation count indeed barely moved. That flat Google line is, incidentally, decent evidence that the gains elsewhere were not just our content getting generically better: the engines whose access we changed moved, and the engine whose access we had not changed did not.

Causation deserves honesty here. We kept publishing during the window, and our content was already structured along the answer-first lines our GEO playbook prescribes — short answerable passages, claims with stated evidence, clean heading hierarchies. Crawl access does not earn citations; it is the precondition for them. The fairest reading of our data is that the content had citation potential that blocked access was wasting, and opening the gates let the existing quality convert. A site with unretrievable, unstructured content should expect the access change alone to do far less — the mechanics of actually earning the citation are covered in our guide to getting cited by ChatGPT, Gemini, and Perplexity.

Referral traffic followed the citations with a lag of two to three weeks, exactly as you would expect if citations drive clicks. AI-surface referrals were 0.8% of organic sessions in week one — mostly the pre-existing Perplexity trickle — and reached 4.1% by the final week, with the curve still bending upward when the measurement window closed. In absolute terms this is modest: hundreds of sessions a week, not thousands. Anyone selling AI referrals as a traffic replacement for classic search is, on our numbers, early by years. The interesting part was never the volume.

The conversion punchline — and what it actually means

The interesting part was what the visitors did. Over the full quarter, sessions from AI surfaces converted to product signups at 2.3 times the rate of our Google organic sessions, and the gap held — fluctuating but never closing — across all thirteen weeks. Perplexity referrals were the strongest cohort, consistent with what we found when we studied who Perplexity actually links to; ChatGPT referrals converted nearly as well; the small Gemini cohort was too thin for confident numbers.

The mechanism, as far as we can reconstruct it from session behavior, is pre-qualification. A visitor from an AI answer has already had the category explained, the options compared, and our name surfaced as a fit for their stated need — the assistant did the top of the funnel before the click. They arrive deeper: AI-referred sessions landed on pricing, comparisons, and docs at nearly twice the rate of organic sessions, skipped the bounce-prone awareness content, and reached the signup flow in fewer pages. Classic organic traffic includes everyone from idle researchers to lost students; AI referral traffic is filtered by a conversation. Smaller pipe, much cleaner water.

Tally the quarter in business terms and the asymmetry that decided our policy becomes plain: the experiment's total measurable cost was under fifteen dollars of bandwidth and a few hours of setup, and its measurable return was a new acquisition channel running at roughly 9% of our organic-attributed signups by the final month — from 4% of the sessions. Against that, the old patchwork robots.txt had been protecting nothing we could name a buyer for.

Before and after funnel comparison over 90 days: citations on twenty benchmark questions up from 6 to 15, AI referrals up from 0.8% to 4.1% of organic sessions, AI visitors converting at 2.3 times the rate of Google organic traffic

A short diary of the quarter

Numbers aggregate away the texture, and some of the texture is instructive, so here is the quarter as it actually felt from inside the log files.

Weeks one and two were the land rush. Within 48 hours of the robots.txt change, GPTBot's request rate quadrupled — the crawler clearly re-checks permissions frequently and wastes no time acting on them. ClaudeBot followed within the week. The CDN dashboard looked alarming enough that our engineer asked, unprompted, whether we were under a slow scraping attack; this was the fortnight in which Bytespider made its case for future rate-limiting and we learned to read the verification columns before reacting to the volume ones.

Weeks three to five were the quiet middle. The training sweeps finished their archaeology — posts from our first year of publishing, fetched for the first time in living memory — and settled into maintenance rhythm. Citation counts barely moved yet, and this is the stretch where an impatient team would have declared the experiment a dud. The index bots were the early tell that something was happening: PerplexityBot's fetch-to-publish lag on new posts dropped from hours to minutes over these weeks, as though the site had been moved to a faster watchlist.

Weeks six to nine were when the lagging indicators caught up. The citation audit crossed half our benchmark questions, the first meaningful ChatGPT Search referral spikes appeared — bursty, tied to individual answers, completely unlike the smooth daily rhythm of Google organic — and the user-triggered fetchers began their climb. One Tuesday in week eight, a single widely shared Perplexity answer drove more pricing-page sessions than our best-performing newsletter of the quarter, which reframed a few internal opinions about where distribution now lives.

Weeks ten to thirteen were consolidation. Trends flattened into trajectories, the conversion gap held steady through enough volume to stop feeling like noise, and the weekly audit became routine enough to hand to a checklist. By the end, the logs read the way we described in the opening of this piece: not a fog, a guest list — with regulars.

The objections we fielded along the way

Running the experiment in the open meant fielding pushback in real time, and three objections recurred often enough to answer here. "You're feeding competitors' models for free." Perhaps — but our content was already public, already summarized in secondhand sources the models read anyway, and the choice was never between models knowing and not knowing about us; it was between models learning about us from our pages or from everyone else's. "The referral numbers are too small to matter." At 4% of organic sessions, yes — if volume were the metric. At twice the conversion rate and still compounding when the window closed, the channel's trajectory matters more than its current size; every channel we now depend on looked like this once. "Ninety days proves nothing about training." Correct, and we said so first — but the decision the experiment needed to inform was the access policy, and access economics turned out to be measurable on exactly this timescale.

What we could not measure, and what we got wrong

A results section is only trustworthy next to its failures, so here are ours. We could not measure training effects at all — whether ninety days of open GPTBot access changes what some 2027 model says about our category is unknowable on this timescale, and we flag it as the genuine open question of the open-access position. We could not attribute the dark funnel: signups whose "where did you hear about us" said some variant of "an AI recommended you" but whose click-path showed direct or branded search ran at roughly half the volume of the tracked AI referrals — suggesting our measured numbers understate the channel meaningfully, but suggestion is not measurement. And our citation audit covered twenty questions in one language in one niche; move the benchmark and the curve will move too.

We also made two errors worth confessing. We left Bytespider unrestricted on principle of "every bot means every bot," and it repaid us by being the single heaviest crawler in week two with no identifiable product surface that ever cited or referred anything; under any non-experimental policy we would rate-limit it without sentiment. And we did not pre-register our benchmark questions, choosing them in week one after seeing early answers — a small bias we corrected by adding a frozen second question set from week four, which showed the same trend shape.

How to replicate this on your own site

If you want your own version of these numbers, the protocol is deliberately cheap to copy. Snapshot your current robots.txt and CDN bot rules so the experiment is reversible in five minutes. Open access for a fixed window — we suggest a full quarter, since citations lagged access by six weeks for us and a one-month test would have caught almost nothing. Before day one, freeze two things: a benchmark set of fifteen to twenty real buyer questions for the weekly engine audit, written down before you see any answers; and a referral segment in your analytics for the AI surfaces, so the baseline is clean. Log verified bot identity from the start — the impersonator traffic will otherwise pollute every number — and keep your publishing cadence normal, because pausing content to "keep the test clean" actually breaks it: retrieval engines feed on freshness. Then read the results in the order the funnel fills: crawl coverage first, index-bot attention second, citations third, referrals and conversions last. If your quarter ends with flat citations and empty referrals, you have learned something equally valuable — most likely that the bottleneck is your content's answerability rather than its accessibility, which points your next quarter at structure rather than access.

What we changed permanently

The experiment ended in June; the policy it produced is now permanent and fits in four lines. Search-index crawlers and user-triggered fetchers: allowed, unconditionally — they are where the measured value lives. Training crawlers from vendors with documented bots and published IP ranges: allowed — the cost is near zero and the possible upside (models that know our domain) is real even if unprovable on our timescale. Bots with no documented purpose, no verification path, or a history of ignoring robots.txt: rate-limited at the CDN, because hygiene is not philosophy. Every rule now carries a dated comment naming its reason — the practice whose absence started this whole story. Our full taxonomy of who these bots are and what each one wants is in the companion field guide to GPTBot, ClaudeBot, and PerplexityBot.

The broader conclusion we will defend beyond our own sample: for a content-as-marketing business in 2026, the burden of proof has flipped. Open access to AI search surfaces is the rational default, and it is blocking that now needs a named, costed justification. Our numbers are one site's numbers — but the experiment is cheap, the downside is bounded, and every team arguing about AI bots from principle could be arguing from their own logs within a quarter.

The honest postscript is that the experiment never really ends: the bot roster changes monthly, citation patterns drift, and the referral curves are still moving. We keep the weekly citation audit and the bot-verification logging running permanently now — that continuous watch is precisely the sort of work Orova automates for its users, tracking AI-era visibility alongside classic rankings so that decisions like this one stay anchored to current data instead of to a quarter-old snapshot. Run the ninety days on your own site; we suspect your robots.txt is a mood too.

Let an AI Agent handle your SEO

Orova plans, writes, optimizes, and tracks rankings on its own — you just read the results.

Try it free