What Is a Content Relevance Score? How It Works and Why It Matters

16 min read

TL;DR

A Content Relevance Score (0-100) measures how well a website's content answers the questions its audience actually asks AI search engines - ChatGPT, Perplexity, Google AI Overviews. It has five components: Query Coverage (what fraction of queries the site can answer), Content Depth (how deeply pages cover each query), Sub-Intent Coverage (whether the site addresses the full fan-out of information needs behind each query), Citation Reality (whether Perplexity already cites the site), and Technical Health (the 26 legacy technical checks as one subcomponent, 15-20% weight). Content signals carry 80-85% of the weight. We built this after our original 26-check technical score showed r=0.009 correlation with actual citations across 441 domains. The follow-up study found content relevance predicts citations with AUC 0.915.

An AI Search Readiness Score is a 0-100 diagnostic that measures how well a website's content answers the questions its audience actually asks AI search engines - ChatGPT, Perplexity, Google AI Overviews, Bing Copilot. It works by decomposing real user queries into sub-intents, then checking whether the site's pages cover each sub-intent with enough depth to be cited.

The score has five components: Query Coverage (what fraction of queries any page on the site can answer), Content Depth (how deeply those pages cover each query), Sub-Intent Coverage (whether the site addresses the full range of information needs behind each query), Citation Reality (whether Perplexity already cites the site for those queries), and Technical Health (whether AI crawlers can actually reach and parse the pages). Content signals carry 80-85% of the weight. Technical signals carry 15-20%.

This is our second version. Here is why.

The first version of this score, shipped in February 2026, measured 26 technical signals across four dimensions (schema markup, crawlability, trust, offering data). We believed that technical readiness was the gate to AI citation. We tested that belief on 441 domains and 14,550 domain-query pairs. The result: r=0.009, p=0.849. Statistically zero. Our technical score did not predict which sites get cited.

The same dataset revealed what does predict citations: content relevance. Same-topic pages were cited 62x more often than cross-topic pages (5.17% vs 0.08%). A classifier built on BM25 plus embedding similarity achieved AUC 0.915. That is the signal the current score is built around. The 26 technical checks are still in the pipeline, but they now act as one subcomponent with a 15-20% weight, not the whole story.

We built LLM SEO Check because we had a theory about what AI engines need to cite a page. The diagnostic we shipped found real technical problems on real sites. The theory turned out to be wrong about citations. This article documents what the score now measures, what the old model got right, and what we learned from scanning a few hundred sites and running our own citation study.

Why AI Search Needs a Different Diagnostic

Traditional SEO metrics - Domain Authority, keyword rankings, organic traffic - were designed for a search model where ten blue links compete for clicks. AI search engines work differently: they retrieve candidate chunks from crawled content, rerank them against the user's query, and synthesize a single answer citing 3-5 sources. A site that ranks #1 for a keyword in Google can be completely absent from the Perplexity answer for the same question.

According to a SparkToro/Datos study, over 58% of Google searches now end without a click. AI search accelerates this by synthesizing answers directly. The decision of which pages to cite happens inside a retrieval-augmented generation pipeline where two things matter most:

  1. Can the engine actually find and read your page? This is technical plumbing - crawler access, JavaScript rendering, indexing. Fail this and nothing else matters. Pass it and you are just in the game, nothing more.
  2. Does your content answer the question the user is actually asking? This is content relevance - whether the language, concepts, and sub-topics on your pages match what the user implicitly wants. This is the dominant signal. Our own research puts the effect at 62x between same-topic and off-topic pages.

Existing SEO tools like Semrush and Ahrefs do not measure either of these things in a way that maps to AI citation. They were built for keyword rankings. Our score was built to fill that gap.

The Five Components of the Score

The current score decomposes into five measurable components. Four of them describe content relevance. One captures technical infrastructure. The weights reflect what our data showed about which signals actually move the needle.

ComponentWeight (paid)What it measures
Query Coverage (QC)25%Share of user queries where any page on the site has a relevance score of at least 5 out of 10.
Content Depth (CD)20%Average of the best-matching-page relevance score across all queries. Captures depth, not just presence.
Sub-Intent Coverage (SI)20%Fraction of sub-intents (the hidden information needs inside each query) that at least one page on the site addresses.
Citation Reality (CR)20%Share of monitoring queries where Perplexity actually cites the site today. Only available on paid scans.
Technical Health (TH)15%Aggregated score from 26 technical checks: crawlability, schema, rendering, trust signals, product data.

Free tier redistributes Citation Reality's weight across the other components, because Perplexity API calls cost money. The formula becomes 30% QC + 25% CD + 25% SI + 20% TH.

1. Query Coverage (QC) - the first gate

The pipeline starts by generating 20 monitoring queries that real users would ask about the site's niche. You can edit, remove, or add queries before analysis runs - we default to a balanced mix of informational and transactional intents, but the reviewer controls the final list.

For each query, we embed the query and each crawled page with text-embedding-3-small, pick the top three candidate pages by cosine similarity, then have GPT-4o rate each candidate on a 0-10 relevance scale. A page scores 5+ only if it plausibly answers the query with real substance, not just mentions the topic.

Query Coverage is the share of queries where any page on the site clears the 5/10 bar. A site can have deep authority on one topic and still score poorly on QC if it ignores the other 15 questions its audience is actually asking.

2. Content Depth (CD) - how well, not just whether

Query Coverage is pass/fail. Content Depth is continuous. It averages the best-page relevance score (0-10) across all queries and scales to 100. A site that scratches every query with a 5/10 has the same QC as a site with 9/10 answers, but lower CD. Depth is what moves a page from “mentioned” to “cited”.

3. Sub-Intent Coverage (SI) - query fan-out

Modern search engines decompose queries into sub-intents. “Best diving gear for beginners” fans out to {essential vs optional gear, safety equipment, budget-friendly options, where to buy locally, common beginner mistakes}. A page that covers two of five sub-intents will lose to a page that covers four, even if both have high surface-level relevance.

Our content evaluator asks GPT-4o to list 3-5 sub-intents per query, then checks each candidate page against each sub-intent. SI is computed across pages: a sub-intent counts as covered if any page on the site addresses it. The metric rewards sites with a broad content footprint, not just one strong page.

For an introduction to the concept, see how Google's query fan-out affects AI visibility.

4. Citation Reality (CR) - what Perplexity actually does

The other four components measure readiness - what should happen, based on content signals. Citation Reality measures what is happening. We call Perplexity's API (sonar model) with each monitoring query and record whether the site is cited, at what position, and with what snippet. When a citation is found, we validate the snippet against actual site content to filter false positives.

Only paid scans get CR, because Perplexity API calls cost money and rate-limit to 1.5 seconds between requests. On free scans, CR's 20% weight is redistributed across QC, CD, and SI.

5. Technical Health (TH) - the hygiene layer

Technical Health is what remains of the old 26-check model. We kept the checks because they measure real problems - but gave them a 15% weight (20% on free), because our research showed these signals explain near-zero variance in actual citation rates. Think of TH as hygiene, not strategy. If it's broken, fix it. If it's at 90+, stop optimizing it and go fix your content.

The 26 checks cover four old dimensions: Machine Readability (crawlability, rendering, schema), Extractability (FAQ blocks, headings, content depth), Trust (business identity, reviews, authorship), and Offering Readiness (product schema, images, breadcrumbs). The following data comes from our first 100 audits and still describes the state of the web accurately, even though it does not predict citations on its own.

What 100 Audits Showed About Technical Health

These numbers are from the first 100 real audits we ran through LLM SEO Check. They describe the state of technical infrastructure across real websites. They are correlations, not citation predictors.

Machine Readability - average 77% of max

Most sites pass the basics. The outlier is Schema.org: only 65% of sites have any structured data, and average completeness is under 50% of maximum.

CheckPass RateAvg / Max
Language & Mobile Optimization100%3.84 / 4
SSL / HTTPS100%2.0 / 2
Page Title & Social Meta Tags97%9.82 / 11
Indexation (robots.txt for AI bots)95.7%2.87 / 3
Canonical URL95.7%1.91 / 2
JS Rendering (AI Crawler View)93.3%2.80 / 3
Open Graph Completeness83.7%2.49 / 3
Schema.org Structured Data65%4.95 / 10

Extractability - average 56% of max

Clean split: structural checks (headings, word count) pass at 95%+. Content quality checks (TL;DR, FAQ richness) fail at nearly 50%. The second group matters more for AI citation - and it's the one most sites neglect.

CheckPass RateAvg / Max
Content Depth (800+ words)95.7%4.67 / 5
Heading Hierarchy (H1/H2)95.7%4.73 / 5
Rich Content & Tables92%6.45 / 10
Meta Description Quality78.3%3.03 / 5
Local Market Relevance78%6.99 / 10
FAQ Content61%5.34 / 10
Content Clarity (BLUF/TL;DR)47.8%1.67 / 5
FAQ Content Richness (LLM)34.8%1.29 / 5

Trust & Entity Identity - average 37% of max (the weakest dimension)

This is where the average site bleeds out. 90% of sites have no customer reviews in AggregateRating schema. 60% have no authorship signals. AI engines that care about source verification see a faceless page.

CheckPass RateAvg / Max
Contact & Privacy Pages94.6%3.43 / 4
Business Identity (NAP)89%7.98 / 15
Authorship Signals40.2%1.41 / 4
GTIN/MPN for Products35.9%0.91 / 4
Customer Reviews & Ratings10%0.55 / 10

Offering Readiness - average 46% of max

For e-commerce specifically, only 44.6% of sites include price and currency in Offer schema - the minimum data ChatGPT Shopping needs to surface a product at all.

CheckPass RateAvg / Max
Image Alt Text Coverage98.9%2.93 / 4
Product/Content Quality56.5%8.05 / 20
Category Breadcrumbs54.3%1.77 / 4
Price & Currency in Offer44.6%1.78 / 4

How AI Engines Actually Process Your Content

Understanding this pipeline explains why content relevance dominates and technical hygiene is necessary but not sufficient. Most AI search engines run some variant of Retrieval-Augmented Generation (RAG), with platform-specific ranking signals layered on top.

AI Citation Funnel

Crawlability   → Can the engine reach and render your pages?
     ↓
Chunk Retrieval → Does your content match the query embedding?
     ↓
Reranking      → Is your chunk more relevant than competitors?
     ↓
Source Trust   → Can the engine verify who you are?
     ↓
Citation       → Your page appears as a cited source

The critical insight: AI engines do not retrieve whole pages. They retrieve chunks. A typical chunk is 300-800 tokens (roughly one H2 section). Each chunk is independently embedded and searchable. A 3,000-word page produces 4-8 independent citation candidates.

Chunk Retrieval is where content relevance dominates. Our research found that topic match - whether your content is actually about what the user asked - is the gate that determines citation, not structural readiness. But if your content is relevant and your site fails at the first step (crawlability), you are invisible anyway. This is why Technical Health stays in the score, just with a smaller weight than we originally thought.

Platform differences: Google AI Overviews relies on Google's search index and Knowledge Graph combined with grounding retrieval. ChatGPT Search uses the Bing index plus its own crawlers and internal embeddings. Perplexity uses its own crawl combined with vector retrieval and live browsing. The funnel above captures the shared logic. Exact ranking signals differ.

Which Signals Correlate With Higher Technical Health Scores

From the same dataset of 100 audits, these signals show the largest gaps between sites that have them and sites that don't. These are correlations, not causation - and they describe correlation with Technical Health, not with actual citation rates. Our larger study of 441 domains showed that TH itself does not predict citations. But the gaps below still tell you something useful about what a well-maintained site looks like.

SignalPresentAbsentDifference
Schema.org structured data66.7 avg (n=66)28.7 avg (n=35)+38.0 pts
FAQ content66.7 avg (n=61)33.5 avg (n=40)+33.2 pts
TL;DR / BLUF summary68.4 avg (n=45)45.1 avg (n=48)+23.3 pts
JS renders correctly59.8 avg (n=71)20.4 avg (n=5)+39.4 pts
Customer reviews in schema57.1 avg (n=10)53.1 avg (n=91)+4.0 pts

JS rendering failure has the strongest negative correlation - sites where AI crawlers see a blank page average 20.4/100 on Technical Health. Schema.org and FAQ presence correlate most strongly with higher TH. Customer reviews show a weak correlation with TH (+4 points), though the small sample of sites with reviews (n=10) limits that conclusion.

Read the table as “what makes a site structurally sound,” not “what gets you cited.” Being sound is still useful - it just is not a citation strategy.

Does Improving Your Score Actually Increase AI Citations?

This is the question that made us redesign the product. Everyone asks it. We tested it properly on our original 26-check model, and the honest answer was: no, not by itself. Our study across 441 domains found r=0.009 between technical readiness score and citation rate. Essentially zero. We tested multiple alternative hypotheses - threshold effects, necessary conditions, within-topic correlations. All null.

What does predict citations is content relevance. A page about diving equipment gets cited when someone asks about diving equipment - regardless of its technical readiness score. The 62x difference between same-topic and cross-topic citation rates dwarfs any signal from structural readiness. That finding is why Query Coverage, Content Depth, and Sub-Intent Coverage now carry 65% of the weight of the total score, and TH only 15%.

The current score is our best attempt at measuring the thing that actually moves the needle. We have not yet proven that following our recommendations raises a real site's citation rate - we are running that experiment on our own site right now. When we have before/after data, we will publish it, positive or null.

In the meantime, use the score the way we use it internally:

  • A TH below 50 means AI engines may physically fail to read you. Fix that first. Nothing else matters until the plumbing works.
  • Low Query Coverage means your content is not about what your audience is asking. That is a content strategy problem, not a schema problem.
  • Low Sub-Intent Coverage with acceptable QC means your pages scratch the surface but don't go deep. Add the missing angles.
  • Low Citation Reality with high QC/CD/SI means the content is right but AI engines don't know about it yet - usually a trust, freshness, or indexing problem.

We know this is an uncomfortable framing for anyone who built a technical SEO tool and expected the score to be the answer. We built that tool. We got the answer wrong. We would rather tell you what we actually found than sell you a story. The full study is published - you can check the methodology yourself.

Score Distribution: What 100 Audits Revealed

This distribution is from the original 26-check model (now exposed as Technical Health). It still describes the real technical state of the web. A similar distribution for the full five-component Content Relevance Score will come after our self-test and pilot cases are published.

TH RangeSitesAvg ScoreStatus
0-2912 (12%)17.6Critical - likely invisible to AI crawlers
30-5948 (48%)40.5Average - some signals present, major gaps
60-7914 (14%)68.7Good - solid foundation
80-10026 (26%)85.7Excellent - technical barriers removed

Average TH: 53.5/100. Median: 51.0. 48% of sites fall in the 30-59 range - technically accessible but missing content and trust signals.

For the full dataset analysis see the study of 100 website audits. For how we measured correlation with actual citation rates, see the null-finding paper.

What We Actually Think About This Score

AI Search Readiness is fundamentally a data quality problem plus a content relevance problem. The websites that present their data in structured, machine-parseable formats and answer the questions their audience actually asks are easier for any automated system to consume - AI search engines, price comparison bots, supply chain integrations, any of them. The diagnostic is not the value. The prioritization of what to fix next is.

Schema.org JSON-LD is effectively the API contract between your website and AI engines. Query coverage is the contract between your content strategy and your audience. If either is broken, you are invisible. If both are sound, you have a chance.

The good news: both problems are solvable and measurable. Your robots.txt either allows GPTBot or it doesn't. Your top pages either cover the sub-intents real users ask about or they don't. The score turns these into a prioritized action list where the high-weight components (QC, CD, SI) point at content work and the low-weight component (TH) points at plumbing work.

The uncomfortable part: fixing all of this does not guarantee citations. Domain reputation, freshness, and factors we do not yet fully understand still play a role. What we can honestly claim is that the score surfaces the signals with the strongest empirical evidence so far. That is not the same as a promise. We publish both our positive and null findings so you can audit our reasoning.

How to Check Your Score

  1. Manual check (5 minutes): Disable JavaScript in your browser, visit your robots.txt, search your page source for application/ld+json, and ask yourself whether your top pages actually answer the top five questions your audience would type into ChatGPT. This covers the most critical signals. The measurement guide walks through all the manual steps.
  2. Free automated audit: Run your URL through LLM SEO Check - the free tier runs the full Content Relevance Score (QC, CD, SI, TH), generates 20 monitoring queries, lets you review and edit them before analysis, and returns a structured report with prioritized gaps. No credit card, no upgrade wall on core value.
  3. Starter consultation (€149 one-time): If you want a human expert to walk through your results, build an implementation plan, and deliver 20-40 prioritized rewrite tasks, book a Starter consultation. We run a limited number of slots per month so we can review each site properly. The paid scan adds Citation Reality (Perplexity citation monitoring) to the analysis.

The first scan typically reveals 3-5 content gaps invisible to traditional SEO audits. For a detailed comparison of available tools see the recommended LLM SEO check tools guide. For the full comparison of AI readiness vs traditional SEO read AI Search Readiness vs Traditional SEO.

Frequently Asked Questions

What is a Content Relevance Score?+

A Content Relevance Score (0-100) measures whether a website's content actually answers the queries its audience asks AI search engines like ChatGPT, Perplexity, and Google AI Overviews. It evaluates five components: Query Coverage, Content Depth, Sub-Intent Coverage, Citation Reality (paid scans), and Technical Health. Content signals carry 80-85% of the weight.

How is a Content Relevance Score different from a traditional SEO score?+

Traditional SEO scores measure ranking signals: backlinks, keyword density, page speed. A Content Relevance Score measures citation signals: does your content answer the specific queries users ask AI engines, and does it cover the sub-intents behind those queries? A site can rank well in Google but score poorly on content relevance if it doesn't address the questions AI users actually ask.

What are the five components of the score?+

The five components are: (1) Query Coverage - what fraction of target queries any page on the site can answer; (2) Content Depth - how deeply the best page answers each query; (3) Sub-Intent Coverage - whether the site addresses the full range of information needs behind each query; (4) Citation Reality - whether Perplexity already cites the site (paid scans only); (5) Technical Health - the 26 legacy technical checks as one subcomponent with 15-20% weight.

Why did you rebuild the score from a 26-check model?+

We tested our original 26-check technical readiness score against actual AI citation rates across 441 domains and 14,550 domain-query pairs. The result: r=0.009, p=0.849 - statistically zero. The technical score did not predict which sites get cited. The follow-up study showed content relevance (measured via BM25 plus embedding similarity) predicted citations with AUC 0.915. We rebuilt the product around that finding.

Why do 90% of sites fail the customer reviews check?+

Most businesses collect reviews on third-party platforms (Google, Trustpilot, Yelp) but don't embed them on their own website with AggregateRating schema. AI engines that parse structured data can use this review data as a trust signal. The fix requires displaying reviews on your site with Schema.org markup. This is part of the Technical Health subcomponent.

Does improving the Content Relevance Score actually increase citations?+

Content relevance is the strongest empirical predictor of AI citations we have found (AUC 0.915). However, we have not yet published a before/after case study proving that applying our specific recommendations raises a real site's citation rate - that experiment is running now. The score measures the signal with the best research backing. That is not the same as a guarantee.

How often should I check my Content Relevance Score?+

Re-scan after making content changes to verify improvement. Content relevance shifts as competitor content evolves and AI engines update their retrieval pipelines. Monthly scans catch regressions. The free tier allows unlimited rescans.

What happened to the old four-dimension model (MR, EX, TR, OR)?+

The 26 technical checks from the old model are still in the pipeline as the Technical Health subcomponent (15-20% weight). They check real things - schema markup, crawl access, content structure, trust signals - but our research showed they don't predict citations on their own. They are now treated as hygiene, not strategy.

AT

Alexey Tolmachev

Senior Systems Analyst · AI Search Readiness Researcher

Senior Systems Analyst with 14 years of experience in data architecture, system integration, and technical specification design. Researches how AI search engines process structured data and select citation sources. Creator of the methodology.

Check Your AI Search Readiness

Get your free AI Search Readiness Score in under 2 minutes. See exactly what to fix so ChatGPT, Perplexity, and Google AI Overviews can find and cite your content.

Scan My Site — Free

No credit card required.

Related Articles