How to Measure AI Search Readiness for Your Website (Step-by-Step)

10 min read

TL;DR

Measuring AI search readiness requires two layers: technical hygiene and content relevance. Technical hygiene (crawl access, schema, rendering) is necessary but our study of 441 domains showed it predicts citations at r=0.009 - essentially zero. Content relevance - measured through Query Coverage, Content Depth, and Sub-Intent Coverage - predicts citations with AUC 0.915. The current score has five components: QC, CD, Sub-Intent Coverage, Citation Reality (paid), and Technical Health (15-20% weight). The free automated scan runs the full four-component audit in a few minutes. The paid Starter consultation (149 one-time) adds Citation Reality via Perplexity and a human expert review.

To measure AI search readiness properly, you have to measure two very different things: whether AI crawlers can physically reach your content, and whether your content actually answers the questions users ask. The first is a hygiene problem. The second is the thing that decides whether you get cited. Our research found that the first one alone predicts citation rate at r=0.009 - essentially zero. The second one, measured through query-content relevance, predicts citations with AUC 0.915.

Content relevance measurement breaks into three signals: Query Coverage (does any page on the site answer each query with real substance), Content Depth (how deeply the best page answers each query), and Sub-Intent Coverage (whether the site addresses the hidden information needs inside each query). The technical layer - crawl access, schema, rendering - sits underneath as Technical Health. If the paid scan is running, a fifth signal called Citation Reality asks Perplexity directly whether your site is already cited for its target queries.

This article walks through how to measure each of those layers yourself, and where an automated scan helps.

Key research finding

Our study across 441 domains and 14,550 domain-query pairs found zero correlation between a pure technical readiness score and actual LLM citation rates (r=0.009). When we rebuilt the classifier around content-query relevance (BM25 plus embedding cosine), it reached AUC 0.915. Content relevance is the signal. Technical measurement still matters because broken plumbing blocks everything else, but it is hygiene, not strategy.

Two Kinds of Measurement: Accuracy vs. Predictive Validity

There is a distinction most AI SEO tools gloss over. Measurement accuracy means a tool correctly detects what it claims to detect. Does the site have Schema.org markup? Is robots.txt blocking GPTBot? Is the content server-rendered? These are binary, verifiable facts.

Predictive validity means those measurements actually predict the outcome you care about. In this case: will an LLM cite your site?

Our original 26-check scanner had good measurement accuracy. When the scanner said your JSON-LD was missing, it was missing. When it said GPTBot was blocked, GPTBot was blocked. But the composite technical score had near-zero predictive validity for citations. A site scoring 85 was not more likely to get cited than a site scoring 35. We tested this across 441 domains and 14,550 domain-query pairs. The data was clear.

That finding reshaped the product. The current measurement adds three content-relevance components on top of Technical Health: Query Coverage, Content Depth, and Sub-Intent Coverage. They carry 65% of the weight on paid scans and 80% on free scans. Technical Health carries 15-20%. You fix the plumbing once. You work on content relevance forever.

The Five Signals to Measure

Here is what each signal tells you and how to measure it. The first three come from our content evaluator. The fourth comes from Perplexity. The fifth comes from the legacy 26-check scanner, reframed as Technical Health.

1. Query Coverage (QC)

What it measures: What fraction of the queries your audience asks AI engines does your site answer at all, with any real substance. Formally: the share of queries where the best-matching page on the site scores 5 or higher on a 0-10 relevance scale.

How we measure it: Pick 20 queries that real users would type into ChatGPT or Perplexity about your niche. Run each query through an embedding model to get a vector. Embed each page on the site. Take the top three pages by cosine similarity per query. Ask an LLM to rate each page on a 0-10 relevance scale. A page clears the bar only if it actually answers the query, not just mentions the topic. QC is the share of queries where at least one page cleared the bar, scaled to 100.

How to check it manually: Write down the five most common questions your audience would type into ChatGPT about your niche. Open each of your five most important pages. For each query-page pair, ask yourself honestly: “If a stranger read this page, would they walk away with a clear, substantive answer to that query?” Count how many of the 25 pairings are a yes. If less than half are, your QC is weak. A real audit runs 20 queries against up to 50 pages and calls an LLM judge, but the manual version catches the biggest gaps in fifteen minutes.

2. Content Depth (CD)

What it measures: How well your best answer actually answers the query. QC is pass/fail. CD is continuous. A site that scratches every query with a 5/10 has the same QC as a site with 9/10 answers, but much lower CD. Depth is what moves a page from “mentioned” to “cited”.

How we measure it: Average the best-page relevance score across all queries, scaled to 100.

How to check it manually: Take the page-query pairs where you said yes on QC. For each, ask: “Is this answer the best one on the internet, or just acceptable?” Sites that dominate on Depth are the ones that go one level deeper than competitors - more examples, more data, more edge cases, more “but here is when this breaks”.

3. Sub-Intent Coverage (SI) - query fan-out

What it measures: Whether your site addresses the full range of information needs behind each query. Modern search engines decompose queries into sub-intents. “Best diving gear for beginners” fans out to {essential vs optional, safety equipment, budget options, where to buy locally, beginner mistakes}. A page that covers two of five sub-intents loses to a page that covers four.

How we measure it: For each query we ask GPT-4o to list 3-5 sub-intents, then check whether any page on the site covers each one. SI is computed cross-page: a sub-intent is covered if at least one page on the site addresses it. The metric rewards a broad content footprint, not just one strong page.

How to check it manually: For each of your top five queries, write down the 3-5 sub-questions a reader is implicitly asking. Then check each page on your site against those sub-questions. Sites that score high on SI usually have a content hub with multiple pages around one topic, each addressing a different angle. Sites that score low usually have one “big page” that tries to cover everything and ends up covering nothing in depth.

For the underlying mechanism see how Google's query fan-out affects AI visibility.

4. Citation Reality (CR) - what actually happens in Perplexity

What it measures: The other four signals measure readiness - what should happen. Citation Reality measures what is happening. Does Perplexity actually cite your site right now for its target queries?

How we measure it: We call the Perplexity API (sonar model) once per monitoring query and record whether the site is cited, at what position, and with what snippet. When a citation is found, we validate the snippet against the actual site content to filter false positives. CR is the share of queries where the site was cited with a verified snippet, scaled to 100. This check is paid-only because Perplexity calls cost money and rate-limit to 1.5 seconds between requests.

How to check it manually: Open Perplexity in an incognito window. Type each of your top ten monitoring queries. For each answer, look at the cited sources. Is your domain anywhere in the list? Repeat the check on a different day - Perplexity is non-deterministic, and one-shot measurements are noisy.

5. Technical Health (TH) - the hygiene layer

What it measures: Whether AI crawlers can physically reach your pages, render them, and parse them. The legacy 26-check scanner covers this layer: robots.txt rules for GPTBot / OAI-SearchBot / PerplexityBot / ClaudeBot, SSL, canonicals, static HTML word count (the JS rendering test), Schema.org presence, FAQ markup, heading hierarchy, content depth, meta description quality, business identity (NAP), authorship signals, customer reviews in AggregateRating schema, Product and Offer data for e-commerce, GTIN/MPN, image alt text, breadcrumbs.

How we measure it: 26 checks, aggregated to a 0-100 subscore, weighted at 15% (paid) or 20% (free) inside the overall Content Relevance Score.

How to check it manually: See the five-step check below. It is fast and catches the three or four issues that matter most.

The Citation Measurement Problem: a 29.3% Noise Floor

There is another measurement problem that nobody in this space talks about. LLM citations are not deterministic. Ask Perplexity the same question twice and you may get different sources cited. In our research we measured this noise floor: 29.3% of citation results change between identical queries. Almost a third of what you observe is random variation, not signal.

If a citation monitoring tool runs each query once and reports a citation rate, nearly a third of that number is noise. Any tool claiming to track your AI citation rate has to account for this, either by repeating queries multiple times or by smoothing over a window of weeks. Including our own - we run queries once per scan for cost reasons, so we surface the noise floor explicitly in paid reports.

For tracking purposes, treat single-query results as directional. Treat trends over 3-4 repeated runs as actionable. Treat trends over weeks as reliable.

Quick Manual Check (10 Minutes, No Tools)

This will not replace an automated scan, but it catches the worst blockers in ten minutes. Steps 1-3 cover Technical Health. Steps 4-5 cover Query Coverage.

Five-Step Check

  1. Disable JavaScript, reload. Open Chrome DevTools, disable JavaScript, reload your homepage and a product or article page. If content disappears, AI crawlers see a blank page. This is a hard blocker and the fix is server-side rendering.
  2. Check robots.txt. Visit yoursite.com/robots.txt and search for GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot. Any Disallow: / means that engine is completely blocked. Many CMS platforms block AI bots by default - this is the single most common reason sites are invisible to AI search.
  3. Look for JSON-LD. View source, search for application/ld+json. If nothing comes up, the site has no structured data. For e-commerce look for Product and Offer. For local business look for LocalBusiness or Organization.
  4. Write down the top five queries your audience asks an AI. Be honest - these are the questions a real user would type into ChatGPT, not keywords from your SEO spreadsheet. If you cannot list five, you have a content strategy problem before you have a readiness problem.
  5. Open each query in Perplexity. Look at the cited sources. Is your domain anywhere on the list? If yes, the site is at least partially cited. If no, you have a content relevance problem - and the automated scan will show you which sub-intents you are missing.

If you fail steps 1 or 2, fix them first. Everything else is academic until AI crawlers can reach your content. If you fail steps 4 or 5, the fix is content work, not technical work, and that is where the real signal lives.

Automated Scanning: What It Gives You (And What It Does Not)

We built an automated scanner at getaisearchscore.com. The free tier runs the full Content Relevance Score: it crawls up to 50 pages, generates 20 monitoring queries, lets you edit or replace any of them before analysis runs, and returns QC, CD, SI, and TH with per-query breakdowns. No login, no credit card, no upgrade wall on core value. Citation Reality is the only component reserved for the paid tier, because Perplexity API calls are the expensive part.

Here is what we can honestly say about it. The content relevance measurement is new - it is built on the same BM25 plus embedding methodology that hit AUC 0.915 in our citation research. The technical checks are the same ones that powered the old 26-check scanner - accurate, actionable, but not, by themselves, predictive of citations.

Here is what we cannot honestly say: that running our scan and fixing everything the report flags will produce a measurable rise in your citation rate. We believe it will, based on the AUC 0.915 finding, but belief is not proof. We are running the intervention experiment on our own site right now. When we have before/after data, positive or null, we will publish it.

Score Bands: What They Mean

The overall Content Relevance Score maps to four bands. Use these to prioritize, not to set expectations for citation volume.

ScoreStatusWhat it actually means
0-29CriticalEither AI crawlers cannot reach the site at all, or the content has almost no overlap with the queries the audience is asking. Fix Technical Health first, then run the scan again to see the content relevance picture.
30-59PartialSome queries are covered, major gaps remain. Look at the Sub-Intent Coverage breakdown - this is usually where the largest gains hide.
60-79SolidContent is broadly relevant. Further gains come from Content Depth - making your best answers better than anyone else's - and from Citation Reality, which depends on domain reputation and freshness.
80-100ReadyContent relevance is not the bottleneck. If citation rate is still low, the issue is domain reputation, content freshness, or sheer competitive density - not readiness signals.

Where to Focus After Measuring

Different weak components need different interventions. Here is the triage we use on scans we run for Starter consultations.

If Technical Health is below 50

Fix crawl access first: robots.txt, SSL, server-side rendering, JSON-LD structured data. A site that AI bots cannot read is invisible regardless of content quality. This is the one area where the fix is clearly necessary and non-negotiable.

If Query Coverage is low

Your content is not about what your audience is asking. This is a content strategy problem, not a schema problem. Look at the queries where you scored zero. Do you have pages on those topics at all? If yes, rewrite them. If no, the gap is new content, not optimization.

If Query Coverage is OK but Sub-Intent Coverage is weak

Your pages scratch the surface on their topics but do not go deep into the sub-questions. This is usually a “one big page” problem - one over-ambitious article that tries to cover everything at once. The fix is breaking it into focused pages, each owning one sub-intent.

If Content Depth is low across the board

Your answers exist but they are shallow. Add concrete examples, data points, edge cases, dated references, direct quotes from experts, screenshots of tools. Depth is what moves a page from “mentioned” to “cited”.

If Citation Reality is lower than QC/CD/SI would predict

The content looks right but AI engines are not picking it up. Likely causes: the site is new and has not been re-crawled, content is stale (missing dateModified or dated 2023), or the domain lacks the trust signals Perplexity weights (NAP data, authorship, external citations from credible sources).

How Often Should You Measure?

Honest advice on diminishing returns:

  • After major changes (CMS migration, robots.txt updates, new content campaign, content consolidation) - re-scan. These are the moments most likely to move the score up or break something.
  • Quarterly at most for routine monitoring. Technical signals do not change fast unless you are changing them. Content relevance can move faster if you are publishing, but not week-to-week.
  • Citation Reality has a 29.3% noise floor. Treat single runs as directional. For real trends, watch the weekly pattern across 3-4 runs.

Most sites score below 50 on their first scan - not because their content is bad, but because they never stress-tested it against the specific queries their audience would ask an AI. That gap is fixable. Closing it is what the Starter consultation is designed for.

The Honest Bottom Line

We built a technical measurement tool. We tested it against real citation outcomes. The tool worked - it measured what it claimed to measure. But the technical measurements by themselves did not predict citations. So we rebuilt the measurement around content-query relevance, which does predict citations with AUC 0.915.

Does that make technical measurement worthless? No. Fixing broken technical signals removes barriers. But it is the content relevance that drives citation selection - and that is the signal the current score weights most heavily.

If another tool tells you they can predict your AI citation rate from a technical audit alone, ask them for the correlation data. We ran the study. The answer, for pure technical audits, is that nobody can.

Run a free Content Relevance audit

The full five-component diagnostic - Query Coverage, Content Depth, Sub-Intent Coverage, Technical Health - is free. Up to 50 pages, 20 monitoring queries, editable before analysis, no login required. If you want a human expert to walk through your results and build an implementation plan, book a Starter consultation (€149, limited slots per month).

Run your free scan at getaisearchscore.com or read what the Content Relevance Score actually measures for a deeper walkthrough of each component.

Why Measuring AI Search Readiness Still Matters

Even with the honest caveats above, measurement gives you three practical advantages over guessing:

  • 1.It finds blockers invisible to traditional SEO tools. Blocked AI crawlers, JS-rendered content LLMs cannot parse, missing entity signals - these do not show up in Google Search Console or Ahrefs. A dedicated measurement catches them.
  • 2.It gives you a baseline for tracking progress. Without measurement you cannot tell whether your changes are working. The five-component breakdown shows which axis moved and which did not.
  • 3.It prioritizes effort by impact. Not all gaps are equal. A weak Sub-Intent Coverage with solid QC is a different fix than a weak QC with decent SI. Measurement separates the two so you fix the right thing first.

One caveat: measuring content relevance is now the main signal. The technical layer is the floor, not the ceiling. If the floor is broken, fix it once. Then spend the rest of your effort on content that actually answers the questions your audience is asking an AI.

Frequently Asked Questions

What is the fastest way to check AI search readiness?+

Enter your URL at getaisearchscore.com for a free Content Relevance Score across five components - Query Coverage, Content Depth, Sub-Intent Coverage, Technical Health, and (on paid scans) Citation Reality. The scan takes a few minutes and includes per-query breakdowns. For a quick manual check, disable JavaScript and see if your content disappears - that means AI crawlers likely can't see it either.

Can I measure AI search readiness without a tool?+

You can check technical hygiene manually: robots.txt for AI bot blocks, page source for JSON-LD schema, JavaScript rendering test. But you cannot manually assess content relevance - the dominant citation predictor (AUC 0.915) - without query decomposition, embedding similarity, and sub-intent analysis. That requires automated tooling.

How often should I measure AI search readiness?+

Monthly is recommended for active sites. Technical Health rarely regresses unless you change CMS or redesign. Content relevance shifts as competitors publish new content and AI retrieval pipelines evolve. Re-measure after major content updates. The free tier allows unlimited rescans.

What does each score range mean?+

0-29 (Critical): major content and technical gaps, AI engines unlikely to cite your site. 30-59 (Average): some coverage but significant query gaps. 60-79 (Good): solid content relevance, regular citations plausible. 80-100 (Excellent): strong coverage across target queries. Note: a high score indicates strong content relevance, not guaranteed citations - brand authority and competition also matter.

AT

Alexey Tolmachev

Senior Systems Analyst · AI Search Readiness Researcher

Senior Systems Analyst with 14 years of experience in data architecture, system integration, and technical specification design. Researches how AI search engines process structured data and select citation sources. Creator of the methodology.

Check Your AI Search Readiness

Get your free AI Search Readiness Score in under 2 minutes. See exactly what to fix so ChatGPT, Perplexity, and Google AI Overviews can find and cite your content.

Scan My Site — Free

No credit card required.

Related Articles