The 26 Factors That Determine Your AI Search Readiness Score

14 min read

TL;DR

Your AI Search Readiness Score is calculated from 26 specific checks grouped into 4 baskets: Machine Readability (25 pts, 7 checks), Extractability (30 pts, 8 checks), Trust & Entity (25 pts, 5 checks), and Organic Reach (20 pts, 6 checks). The formula is Score = 0.25×MR + 0.30×EX + 0.25×TR + 0.20×OR. A JS dependency penalty halves MR if your page renders fewer than 50 words without JavaScript. Fix Extractability first — it carries the most weight and typically has the largest gaps.

I designed these 26 checks based on a theory about what AI engines need to cite a page. The theory turned out to be partly wrong — the composite score does not predict citations. But the individual checks still find real technical problems that are worth fixing on their own merits.

When I built the scoring model for getaisearchscore.com, I had a clear hypothesis: sites that expose more structured data, clearer content, and stronger trust signals should get cited more often by ChatGPT, Perplexity, and Google AI Overviews. I chose 26 specific checks, grouped them into four weighted baskets, and assigned point values based on what I believed mattered most.

Then I ran an empirical study on 441 domains and 14,550 domain-query pairs. The correlation between the composite score and actual citation rate was r=0.009 (p=0.849). Essentially zero. The score does not predict whether AI engines will cite you.

This article documents all 26 checks honestly: why I chose each one, what it actually measures, and what I now think about its relationship to AI citations. If you are new to the concept, start with What Is an AI Search Readiness Score? for the overview.

What the Research Actually Found

Honest disclosure

I tested whether a higher AI Search Readiness Score leads to more AI citations. It does not. The composite score has zero predictive power (r=0.009, n=441 domains).

What does predict citations? Content relevance. Pages that match the topic of a query get cited at 5.17% vs 0.08% for off-topic pages — a 62x difference. The score cannot measure relevance because relevance is query-dependent, not a property of the page itself.

So why keep the 26 checks? Because they measure real technical quality problems. A missing robots.txt rule, broken schema markup, or JS-only rendering are genuine issues regardless of whether they predict citations. Think of the score as a technical health audit, not a citation predictor.

The Scoring Formula

AI Search Readiness Score (0–100)

Score = 0.25 × Machine Readability + 0.30 × Extractability + 0.25 × Trust & Entity + 0.20 × Organic Reach

I gave Extractability the highest weight (30%) because I believed content that is easy to pull quotes from would get cited more. That reasoning sounded good in theory. In practice, the weights do not matter for citation prediction because the entire composite has no correlation with citations.

The weights do reflect a reasonable priority order for technical quality, though. Content clarity and extractability are harder to fix than adding a meta tag, so giving them more weight makes the score more useful as a diagnostic tool.

The Complete 26-Check Table

Below is every check in the scoring engine. Core checks are visible on the free scan. Premium checks unlock with a paid plan. LLM checks use GPT-4o to evaluate content quality signals that rule-based checks cannot capture.

IDCheck NameBasketMax PtsTier
MR2Language & Mobile OptimizationMR4Core
MR3Schema.org Structured DataMR10Core
MR14Page Title & Social Meta TagsMR11Core
EX3FAQ ContentEX10Core
EX12Rich Content & Comparison TablesEX10Core
EX5Local Market RelevanceEX10Core
TR1Business Identity (NAP)TR15Core
TR3Customer Reviews & RatingsTR10Core
OR1Product/Content QualityOR20Core
MR1Indexation (robots.txt)MR3Premium
MR4SSL / HTTPSMR2Premium
MR5Open Graph CompletenessMR3Premium
MR7JS Rendering (AI Crawler View)MR3Premium
MR6Canonical URLMR2Premium
EX1Meta Description QualityEX5Premium
EX6Heading Hierarchy (H1/H2)EX5Premium
EX7Content DepthEX5Premium
TR5Authorship SignalsTR4Premium
TR6GTIN/MPN for ProductsTR4Premium
TR7Contact & Privacy PagesTR4Premium
OR4Image Alt Text CoverageOR4Premium
OR5Price & Currency in OfferOR4Premium
OR6Category BreadcrumbsOR4Premium
EX_LLM_BLUFContent Clarity (BLUF/TL;DR)EX5LLM
EX_LLM_FAQFAQ Content Richness (LLM)EX5LLM
EX_LLM_LOCALLocal Relevance (LLM)EX5LLM
EX_LLM_STRUCTUREContent Structure (LLM)EX5LLM

Machine Readability (MR) — 25 Points, 8 Checks

When I designed this basket, my reasoning was simple: if AI crawlers cannot access and parse your pages, nothing else matters. That logic still holds — a blocked robots.txt is a real problem regardless of citation prediction. The MR basket contains 8 checks with a raw maximum of 38 points, normalized to 25 in the final score.

MR1 — Indexation (robots.txt) • 3 pts • Premium

I chose this as the first check because it is a binary gate. This check verifies that your robots.txt does not block AI-specific crawlers: GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot, and Google-Extended. A single Disallow rule targeting any of these user agents means that crawler will never see your content.

This is one check where the logic is airtight even without empirical validation. If a crawler is blocked, it cannot cite you. That is not a correlation — it is a prerequisite.

Fix: Audit your robots.txt. Explicitly allow all AI crawlers or remove blanket disallow rules. Often a 2-minute fix.

MR2 — Language & Mobile Optimization • 4 pts • Core

I included this check because AI engines need to match content to query language. The check looks for a declared lang attribute on the HTML element and a mobile-friendly viewport meta tag.

In retrospect, the viewport tag portion is questionable for AI citation purposes. AI crawlers do not care about mobile rendering. But the language declaration is genuinely useful for content matching.

Fix: Add <html lang="en"> (or your target language) and <meta name="viewport" content="width=device-width, initial-scale=1"> to your document head.

MR3 — Schema.org Structured Data • 10 pts • Core

This is the single highest-value technical check in the MR basket. I gave it 10 points because structured data is how machines understand what a page is about. The check looks for JSON-LD — specifically Product, FAQPage, Organization, LocalBusiness, and BreadcrumbList schemas. Each schema type found adds points.

I still think structured data matters for AI discoverability in principle. The problem is that my study could not isolate its effect — the composite score washes it out. For a deep dive, see Schema Markup for AI Search.

Fix: Add JSON-LD blocks in your page's <head> or before the closing </body>. Start with Product (e-commerce) or Organization (services), then add FAQPage and BreadcrumbList.

MR4 — SSL / HTTPS • 2 pts • Premium

I included SSL because it seemed like a baseline trust signal. In practice, nearly every site has SSL in 2026, so this check rarely differentiates. I gave it only 2 points for that reason.

Fix: Enable SSL through your hosting provider or a reverse proxy like Cloudflare. Most modern hosts offer free SSL via Let's Encrypt.

MR5 — Open Graph Completeness • 3 pts • Premium

Checks for og:title, og:description, og:image, and og:url. I included Open Graph because AI engines might use these tags as a secondary signal for page identity. Honestly, I am not confident this matters for AI citation specifically. It is good practice for social sharing regardless.

Fix: Add the four core OG tags to every page. Most CMS platforms and frameworks have plugins or built-in support for this.

MR6 — Canonical URL • 2 pts • Premium

Verifies that the page declares a canonical URL via <link rel="canonical">. I included this because duplicate content can confuse AI engines when they encounter the same page through different URL parameters or www/non-www variants.

Fix: Add a canonical link element pointing to the preferred URL of each page. Most frameworks support this natively (Next.js alternates.canonical, WordPress Yoast, etc.).

MR7 — JS Rendering (AI Crawler View) • 3 pts • Premium

This check simulates what an AI crawler sees by comparing server-rendered HTML against the fully rendered page. I chose this check because many modern sites render content entirely via JavaScript, and most AI crawlers do not execute JS. If your content is invisible without JavaScript, it is invisible to these crawlers.

This is another check where the logic is mechanistically sound even without citation correlation data. If a crawler literally cannot see your content, it cannot cite it. For more details, see Why AI Crawlers Hate Your JavaScript.

Fix: Switch from client-side rendering (CSR) to server-side rendering (SSR) or static site generation (SSG). Ensure JSON-LD structured data is present in the initial HTML response, not injected by JavaScript.

MR14 — Page Title & Social Meta Tags • 11 pts • Core

The highest-scoring individual MR check. I gave it 11 points because titles and descriptions are the primary way AI engines identify what a page is about before reading the full content. The check evaluates <title>,meta description, og:title, and twitter:card.

Fix: Write unique, descriptive titles (50–60 characters) and meta descriptions (120–160 characters) for every page. Include your primary entity or product name in the title.

Extractability (EX) — 30 Points, 10 Checks

I designed this as the highest-weighted basket because I believed the ability to pull a direct, citable answer from content was the most important factor for AI citation. The basket includes 6 rule-based checks and 4 LLM-evaluated checks.

My research showed this assumption was wrong at the composite level. But the individual checks here still identify genuine content quality issues — thin pages, missing FAQ sections, poor heading structure. These are problems worth fixing for user experience alone.

EX1 — Meta Description Quality • 5 pts • Premium

Evaluates the meta description for length (25–160 characters), uniqueness, and whether it contains a factual summary rather than generic marketing copy. I included this because I expected AI engines to use meta descriptions as candidate snippets.

Fix: Write meta descriptions that answer the page's primary question in one sentence. Avoid calls to action like "Click here to learn more" — AI engines cannot click.

EX3 — FAQ Content • 10 pts • Core

Checks for FAQ-style content: question-and-answer blocks, FAQPage schema, or content formatted as Q&A pairs. I gave this 10 points because FAQs match the question-answer pattern that AI engines use to generate responses. The format seemed like a natural fit.

Whether FAQs actually increase citation rates is unproven by my data. But they do make content more useful for human visitors, which is reason enough.

Fix: Add a FAQ section with 4–8 questions to your key landing pages, product pages, and service pages. Pair the content with FAQPage JSON-LD schema.

EX5 — Local Market Relevance • 10 pts • Core

Evaluates whether your content contains local market signals: city names, regional references, local phone formats, currency symbols, and area-specific language. I chose this check because many AI queries are location-specific ("best diving shop in Lisbon").

This check is closest to measuring content relevance — which my research showed is the real driver of citations. But it only measures local relevance signals, not topic relevance to a specific query.

Fix: Mention your city and region explicitly in content, include a local phone number, reference local landmarks or geography, and use the local currency throughout pricing pages.

EX6 — Heading Hierarchy (H1/H2) • 5 pts • Premium

Checks that the page has exactly one H1 tag and multiple H2 tags forming a logical content hierarchy. I included this because heading structure helps AI engines navigate to section-level answers without reading the entire page.

Fix: Use a single H1 for the page title, H2 for major sections, and H3 for subsections. Avoid skipping heading levels or using headings for styling purposes.

EX7 — Content Depth • 5 pts • Premium

Measures total word count of substantive text. Pages with fewer than 300 words score zero; pages with 800+ words get full points. I chose this threshold somewhat arbitrarily — it felt like a reasonable minimum for a page to be considered substantive.

Fix: Expand thin pages to at least 800 words of substantive content. Focus on answering questions your target audience actually asks, not padding with filler.

EX12 — Rich Content & Comparison Tables • 10 pts • Core

Looks for structured content elements: comparison tables, lists, bullet points, and data-rich blocks. I gave this 10 points because I believed AI engines prefer pages that organize information in easily extractable formats rather than long prose.

Fix: Add HTML tables comparing products, features, or options. Use ordered and unordered lists for steps and specifications. Structure data visually so it can be parsed programmatically.

EX_LLM_BLUF — Content Clarity (BLUF/TL;DR) • 5 pts • LLM

Uses GPT-4o to evaluate whether the page leads with a Bottom Line Up Front (BLUF) or TL;DR summary. I chose this because AI engines extract answers from the top of content first. Pages that bury the answer below background context seemed like they would score lower in AI retrieval.

Fix: Start every key page with a 1–2 sentence summary that directly answers the page's primary question. Place it before any background context or introduction.

EX_LLM_FAQ — FAQ Content Richness (LLM) • 5 pts • LLM

An LLM-evaluated complement to the rule-based EX3 check. Where EX3 checks for the presence of FAQ content, this check evaluates the quality of the answers: are they specific, factual, and complete enough to be cited?

Fix: Write FAQ answers that are self-contained, factual, and specific. Avoid vague answers like "It depends" — instead, provide concrete data points, price ranges, or step-by-step instructions.

EX_LLM_LOCAL — Local Relevance (LLM) • 5 pts • LLM

An LLM-evaluated complement to EX5. While EX5 checks for local keywords using rules, this check evaluates whether the content genuinely demonstrates local expertise — knowledge of local regulations, cultural context, or area-specific advice that only a local business would know.

Fix: Go beyond mentioning your city name. Reference local events, regulations, seasonal patterns, or neighborhood details that demonstrate genuine local presence.

EX_LLM_STRUCTURE — Content Structure (LLM) • 5 pts • LLM

Evaluates the overall information architecture of the page: is content logically organized, are sections clearly delineated, and can a reader (or AI engine) navigate to specific answers without reading the entire page?

Fix: Organize content into clearly labeled sections with descriptive headings. Use a table of contents for long pages. Ensure each section can stand alone as a citable unit.

Trust & Entity (TR) — 25 Points, 5 Checks

I designed this basket around the idea that AI engines assess source trustworthiness before citing. A real business with verifiable identity should outperform anonymous content. That reasoning felt solid. My data could not confirm or deny it at the individual check level — only the composite score was tested.

TR1 — Business Identity (NAP) • 15 pts • Core

The highest-scoring single check across all baskets. I gave it 15 points because I believed Name, Address, and Phone (NAP) data in LocalBusiness or Organization schema was the foundation of entity trust. Without it, AI engines treat your site as an unverified source.

At 15 points, this check has outsized influence on the composite score. If I were redesigning the model, I might distribute these points more evenly. But as a diagnostic tool, it works — missing NAP is a real issue for any business site.

Fix: Add LocalBusiness schema with your complete business name, street address, phone number, and opening hours. Ensure this data matches your Google Business Profile exactly.

TR3 — Customer Reviews & Ratings • 10 pts • Core

Checks for AggregateRating or Review schema on the page. I included this because AI engines frequently include star ratings in their responses, especially for "best of" queries. Structured review data seemed like it would make a site more citable.

Fix: Add AggregateRating schema to product pages. If you use a third-party review platform (Trustpilot, Google Reviews), ensure their widget outputs structured data or add it manually.

TR5 — Authorship Signals • 4 pts • Premium

Checks for author meta tags, Person schema, or visible author bylines. I included this because Google's E-E-A-T framework emphasizes authorship, and I assumed AI engines would follow similar logic. Whether they actually do is unclear from my data.

Fix: Add an author byline to every article and key page. Include Person schema with the author's name and a link to their professional profile (LinkedIn, company bio page).

TR6 — GTIN/MPN for Products • 4 pts • Premium

For e-commerce sites, this check verifies that Product schema includes global trade identifiers: GTIN (barcode), MPN (manufacturer part number), or ISBN (books). I chose this because these identifiers let AI engines unambiguously identify which product you sell, enabling accurate comparison in shopping queries.

Fix: Add gtin, mpn, or isbn fields to your Product JSON-LD. Most e-commerce platforms store these values — they just need to be exposed in structured data.

TR7 — Contact & Privacy Pages • 4 pts • Premium

Checks for dedicated contact and privacy policy pages linked from the main navigation or footer. I included this as a baseline legitimacy signal — real businesses have these pages, spam sites typically do not.

Fix: Create dedicated /contact and /privacy pages with real information. Link them from your site footer on every page.

Organic Reach (OR) — 20 Points, 4 Checks

I designed this basket to evaluate how well specific product or service data is structured for AI extraction. It carries the lowest weight (20%) because I considered it secondary to content quality. For e-commerce sites, these checks matter more than for content sites.

OR1 — Product/Content Quality • 20 pts • Core

The highest single-check value in the OR basket. This check evaluates Product schema completeness: name, description, image, offers, price, currency, and availability. For non-e-commerce sites, it evaluates overall content quality signals.

At 20 points, this is tied with TR1 as the most influential individual check. I gave it this weight because complete product data seemed essential for AI shopping engines. That logic is sound for ChatGPT Shopping specifically, even if the composite score does not predict general AI citations.

Fix: For e-commerce, ensure every Product schema includes name, description, image, and a complete Offer (price, priceCurrency, availability). For content sites, add structured data that represents your primary entity or service.

OR4 — Image Alt Text Coverage • 4 pts • Premium

Measures the percentage of images with descriptive alt text. AI engines cannot see images directly — they rely on alt text to understand visual content. I included this because missing alt text means missing context for the AI.

Fix: Add descriptive alt text to every product and content image. Describe what the image shows factually ("Red dive mask, front view"), not generically ("product image").

OR5 — Price & Currency in Offer • 4 pts • Premium

Checks that Offer schema includes explicit price and priceCurrency fields. I chose this because AI shopping engines (ChatGPT Shopping, Perplexity Shopping) require machine-readable pricing to include products in comparison responses. Prices visible only as text on the page are often missed.

Fix: Add price and priceCurrency to every Offer in your Product schema. Include availability (InStock, OutOfStock) for complete coverage.

OR6 — Category Breadcrumbs • 4 pts • Premium

Checks for BreadcrumbList schema that reflects the page's position in your site hierarchy. I included this because breadcrumbs help AI engines understand the relationship between a product and its category, which matters for queries like "best wetsuits under $200."

Fix: Add BreadcrumbList JSON-LD to every product and category page. Match the breadcrumb trail to your visible navigation path (Home → Category → Product).

The JS Dependency Penalty

I added a special penalty outside the normal check system: if the static HTML of your page (before JavaScript execution) contains 50 or fewer words, the entire Machine Readability subscore is multiplied by 0.5. I chose this penalty because AI crawlers like GPTBot and PerplexityBot often cannot execute JavaScript. If your content is invisible without JS, it is invisible to AI search.

The penalty is binary: either your static HTML has more than 50 words (no penalty) or it does not (MR score halved). There is no partial penalty. A site that scores 80/100 on MR checks but triggers the JS penalty effectively drops to 40/100 on Machine Readability.

This is one of the design decisions I am most confident about. The mechanistic logic is clear: no content in static HTML means no content for non-JS crawlers. That is not a statistical claim — it is a technical fact.

For a complete guide to fixing JavaScript rendering issues, see Why AI Crawlers Hate Your JavaScript (And How to Fix It).

What I Would Change Today

If I were redesigning the scoring model with the benefit of my research findings, I would make several changes.

First, I would not present the composite score as a citation predictor. The data is clear: it is not. I would frame it as a technical readiness audit — which is what it actually measures well.

Second, I would add a content relevance dimension. My study found that same-topic pages get cited 62x more than off-topic pages. No amount of schema markup or structured data compensates for writing about the wrong topic. But relevance is query-dependent, so it cannot be a static page-level score.

Third, I would redistribute the point weights more evenly. TR1 at 15 points and OR1 at 20 points have disproportionate influence on the composite. A flatter distribution would make the score less sensitive to individual checks.

Which Factors to Fix First

Even though the composite score does not predict citations, fixing these issues improves your site's technical quality. Here is my recommended fix order, based on effort vs diagnostic value:

  • 1.
    Check MR1 (robots.txt) first. If AI crawlers are blocked, they literally cannot see your site. This is a binary gate with clear mechanistic logic. A 2-minute edit to robots.txt.
  • 2.
    Fix JS rendering issues. If your content only appears after JavaScript execution, it is invisible to most AI crawlers. This is another mechanistic issue, not a statistical one.
  • 3.
    Add structured data (MR3, OR1). Schema markup is the standard way machines understand page content. Whether it directly causes citations is unproven, but it is good engineering practice.
  • 4.
    Improve content extractability (EX basket). Add FAQ sections, TL;DR summaries, comparison tables. These make your content more useful for humans too, which is reason enough.
  • 5.
    Focus on content relevance. This is not something the score measures, but it is the only factor my research confirmed as a real citation driver. Write content that directly answers the queries your audience is asking AI engines.

Want to see which of the 26 checks your site passes and which it fails? Run a free scan with our AI Search Readiness audit tool. It evaluates all 26 checks in under 2 minutes. Just remember: the score measures technical readiness, not citation likelihood. For that, you need content relevance — and no tool can automate that for you.

Frequently Asked Questions

How many factors affect an AI Search Readiness Score?+

There are 26 individual checks grouped into 4 baskets: Machine Readability (7 checks, 25% weight), Extractability (8 checks, 30% weight), Trust &amp; Entity (5 checks, 25% weight), and Organic Reach (6 checks, 20% weight). Each check has a maximum point value, and the basket scores are weighted to produce a final score out of 100.

Which factor has the biggest impact on the score?+

Business Identity (NAP) in the Trust basket carries the highest single-check weight at 15 points. However, the Extractability basket as a whole has the most impact at 30% of the total score. If you can only fix one thing, start with FAQ sections and structured answer blocks — they boost both EX3 (FAQ Content, 10 pts) and EX12 (Rich Content, 10 pts).

What is the JS dependency penalty?+

If your page renders fewer than 50 words of visible text without JavaScript enabled, the entire Machine Readability subscore is multiplied by 0.5. This penalty exists because AI crawlers like GPTBot and PerplexityBot often cannot execute JavaScript, so content hidden behind client-side rendering is effectively invisible.

Can I improve my score without technical changes?+

Some checks are content-only: adding FAQ sections (EX3), improving meta descriptions (EX1), increasing content depth past 800 words (EX7), and adding author bylines (TR5). These do not require developer involvement. However, the highest-impact changes — structured data (MR3), robots.txt (MR1), and JS rendering (MR7) — typically need a developer.

AT

Alexey Tolmachev

Senior Systems Analyst · AI Search Readiness Researcher

Senior Systems Analyst with 14 years of experience in data architecture, system integration, and technical specification design. Researches how AI search engines process structured data and select citation sources. Creator of the methodology.

Check Your AI Search Readiness

Get your free AI Search Readiness Score in under 2 minutes. See exactly what to fix so ChatGPT, Perplexity, and Google AI Overviews can find and cite your content.

Scan My Site — Free

No credit card required.

Related Articles