Is the Content Relevance Score audit free?

The free tier runs the full four-component audit - Query Coverage, Content Depth, Sub-Intent Coverage, and Technical Health - with per-query breakdowns and sub-intent gap analysis. No login, no credit card. The paid Starter consultation (149 one-time) adds Citation Reality via Perplexity and a human expert review with 20-40 prioritized rewrite tasks.

How accurate is the Content Relevance Score?

The content relevance components (QC, CD, SI) use GPT-4o evaluation plus BM25 and embedding similarity - the same signals that achieved AUC 0.915 in our citation prediction study. The Technical Health subcomponent uses Playwright-based crawling that simulates how AI bots see your page. We have not yet published a before/after case study proving our recommendations raise citation rates - that experiment is running now.

What are the main limitations?

No continuous monitoring (it is a one-off diagnostic, not a dashboard). No traditional keyword tracking. Not designed for 10K+ page enterprise sites. The tool diagnoses content and technical gaps and tells you what to fix, but you or your team must implement the changes.

How does it compare to Conductor?

Conductor is an enterprise brand monitoring platform with continuous AI citation tracking. LLM SEO Check is a one-off content relevance diagnostic plus optional human consultation. Conductor gives you dashboards and trends over time. We give you a deep per-query, per-sub-intent diagnosis and a fix list. Different tools for different problems. See our detailed comparison article.

LLM SEO Check Review: Content Relevance Score - What It Does, Costs, and Delivers

Disclosure

This is a review of our own tool, written by the founder, Alexey Tolmachev. That makes us the least objective reviewers possible. We compensate by being transparent about what the tool does well, what it does not, and what our own research showed about its original design.

LLM SEO Check runs getaisearchscore.com, a Content Relevance Score audit for websites that want to be cited in AI search engines like ChatGPT, Perplexity, Google AI Overviews, and Bing Copilot. The current version of the score has five components: Query Coverage, Content Depth, Sub-Intent Coverage, Citation Reality (paid), and Technical Health. This is our second version. The first version measured 26 technical checks across four dimensions and produced a 0-100 technical readiness score.

We redesigned the product after our own research showed that the original score did not predict actual AI citations. The story of why we rebuilt it is the most useful part of this review. Here is what the tool does now, what it did before, what we learned, and who should use each version for what.

What It Does Now (Version 2, post-pivot)

You enter a URL, your email, and your business niche. The system crawls up to 50 pages with Playwright (simulating how AI bots see the site, including JavaScript rendering), runs the 26 technical checks as a Technical Health subcomponent, then hands off to the content evaluator.

The content evaluator generates 20 monitoring queries that a real user would ask an AI engine about the site's niche. You can edit, remove, or add queries on a dedicated review page before analysis runs. Once you trigger analysis, each query is embedded, the top three relevant pages per query are retrieved, and GPT-4o rates each page on a 0-10 relevance scale plus sub-intent coverage. On paid scans, Perplexity is queried with each monitoring query to measure Citation Reality - whether the site is already cited today.

The five components aggregate to an overall Content Relevance Score. The free tier includes QC, CD, SI, and TH. The paid consultation adds CR and a human expert walking through the results. No account creation for the free scan. No credit card for the free tier.

Our pricing is deliberately simple: the free audit runs the full diagnostic (we do not gate core value), and a €149 one-time Starter consultation adds Citation Reality, a 24-48 hour human-verified report with 20-40 prioritized rewrite tasks, platform playbooks for Shopify / WooCommerce / Prestashop, and a follow-up call.

What It Does Well

The free tier gives real value. Full five-component audit, not a teaser. You get the score, per-query breakdown, sub-intent gaps, and top recommendations on the free plan. The paid tier adds Citation Reality and a human expert, not “unlocks everything”.

Query fan-out analysis. This is the part that maps directly to what AI engines actually do. Decomposing “best diving gear for beginners” into 3-5 sub-intents and checking each of your pages against each sub-intent is the closest thing we have found to measuring “does this site answer the real question”.

Specific, actionable findings. If your JSON-LD is missing a price field, the tool tells you which field and shows an example. If a monitoring query maps to zero pages with relevance above 5/10, you see exactly which query and which sub-intents are not covered. Each finding comes with concrete fix guidance, not vague advice.

JavaScript rendering detection. The Playwright-based crawl catches a real and common problem: sites that serve blank pages to bots that do not execute JavaScript. Many React and SPA sites have this issue and do not know it. This check alone has been the single most practically useful finding for our users on the Technical Health layer.

Honest research basis. The scoring weights on the current version are not first-principles guesses. They reflect what our own content relevance study found (AUC 0.915 on 13,140 domain-query pairs). When the data disagreed with our first design, we rebuilt the product instead of hiding the finding.

Fast. Free scan returns core score in about two minutes, content evaluation takes another 2-5 minutes depending on site size. No onboarding call needed.

The Hard Truth About the First Version

Research finding

Our study across 441 domains and 14,550 domain-query pairs measured the correlation between the original 26-check AI Search Readiness Score and actual AI citation rates. Result: r = 0.009, p = 0.849. Statistically zero. The technical score did not predict whether AI engines would cite a site.

This is the most important thing any reviewer (us or anyone else) can say about the first version of this tool. We built a scoring model based on reasonable assumptions about what AI engines value - schema markup, crawlability, trust signals, product data. We then tested those assumptions empirically. The data said: no correlation.

We tried several follow-up theories. Threshold effects (maybe the score only matters above a certain level) - null. Necessary condition analysis (maybe a high score is required but not sufficient) - null. Within-topic analysis (maybe the score matters when comparing sites in the same niche) - still null (r = -0.010).

The signal that did work was content relevance. Sites were cited 5.17% of the time for same-topic queries vs 0.08% for cross-topic queries - a 62x difference. The follow-up classifier on BM25 plus embedding similarity reached AUC 0.915. Having content that matches what someone is asking about matters enormously. Having clean structured data and good meta tags matters far less than we assumed.

We published this because founders should share null results, not bury them. We then rebuilt the score around the signal that actually worked. The 26 technical checks still run, but they now sit under a single Technical Health subcomponent with a 15-20% weight instead of being the whole score. The full methodology is in the null-finding paper.

So What Is the Score Good For?

Technical Health subcomponent: catches the structural elements that make AI citation mechanically possible. Think of it as a pre-flight checklist. If your robots.txt blocks AI crawlers, the tool catches that. If your structured data is malformed, you will know. If your pages render blank without JavaScript, you will see it. These are real problems with real fixes, and finding them is useful. Just do not expect fixing them alone to produce citations.

Query Coverage / Content Depth / Sub-Intent Coverage: this is the new core. It tells you whether your content actually answers the questions your audience would ask an AI engine - which is the gate that decides citations. It is also the component where most sites have the biggest gaps, because nobody was measuring it before.

Citation Reality (paid): ground truth. Perplexity API calls for each monitoring query tell you what is happening today, not what should happen. If this number is lower than QC / CD / SI would predict, the gap is usually domain reputation, freshness, or indexing - things outside the pipeline we control directly.

Real Weaknesses

No proven before/after case yet. The content relevance signal has strong observational support (AUC 0.915), but we have not yet published a case where a site applied our recommendations and saw citation rate rise. We are running that experiment on our own site right now. We will publish the result regardless of which way it goes.

One-time scan, not monitoring. You get a snapshot. There is no dashboard that tracks your score over time, alerts you when something breaks, or shows trends. You can rescan manually. If you need continuous monitoring, this is not the tool.

Limited page coverage. Free scans cover up to 50 pages. If you have thousands of product pages, you are seeing a sample, not a complete picture. Query-aware page expansion helps by pulling in additional pages that match the monitoring queries, but there is still a ceiling.

No content strategy guidance in the free tier. The tool tells you which sub-intents are missing. It does not write the content for you. The Starter consultation exists specifically to bridge that gap with a human expert, which is why it is a high-touch offer and not a self-serve automatic report.

No access behind login walls. The crawler cannot authenticate. Paywalled or gated content cannot be scanned.

Small team. We are a small team, currently founder-led. Slower feature development than a VC-funded competitor. The upside is that we can publish null findings without marketing pressure forcing us to hide them.

Citation Reality has a noise floor. LLM citations are not deterministic - the same query run twice can return different sources ~29% of the time. Our paid scan runs each query once for cost reasons. Treat single-run CR scores as directional, not precise.

Who Should Use It

You suspect your site has content gaps relative to what your audience asks AI engines (missing topics, weak depth, unanswered sub-intents)
You suspect technical issues that prevent AI engines from reading your site (JavaScript rendering, blocked crawlers, missing structured data)
You want a fast, specific diagnostic rather than a multi-week manual audit
You need a baseline report to share with developers or stakeholders
You are an SEO professional who wants a content-relevance-first check to complement traditional rank tracking
You want to hire a human expert for a one-off deep audit plus implementation plan (Starter)

Who Should Not Use It

You want a guarantee that a high score equals AI citations (nobody can honestly promise that)
You need ongoing monitoring with alerts, trend dashboards, and team collaboration features
You have an enterprise site with 10,000+ pages that needs a full crawl of every URL
You want a done-for-you managed GEO service - we do diagnostic plus implementation guidance, not retainer-based execution

Pricing

Feature	Free	Starter (€149 one-time)
Content Relevance Score (QC / CD / SI / TH)	Yes	Yes
Query review flow (edit monitoring queries)	Yes	Yes
Per-query breakdown + sub-intent gaps	Yes	Yes
Citation Reality (Perplexity monitoring)	No	Yes
Human expert review (24-48h)	No	Yes
20-40 prioritized rewrite tasks	No	Yes
Implementation call + follow-up rescan	No	Yes
Pages crawled	Up to 50	Up to 200

One-time pricing, not subscription. No recurring charges, no annual contracts. Starter runs on a limited number of slots per month so we can review each site properly.

The Bottom Line

The current version of this tool is good at measuring content-query relevance, which is the signal our own research identified as the dominant predictor of AI citations (AUC 0.915). It is also good at finding technical problems that prevent AI engines from reading a site - that is what the old 26-check scanner did, and those checks are still there under the Technical Health layer.

We do not yet have a published case of a real site applying our recommendations and seeing its citation rate rise. We are running that experiment on our own site now. Until then, what we can honestly claim is that the score measures the signals with the strongest empirical support so far, not that applying it will guarantee citations.

Use it as a content relevance diagnostic. Do not use it as a crystal ball.

If you want to try it, the free scan is at getaisearchscore.com. You will have results in a few minutes and can judge for yourself whether the findings are useful. If you want a human expert to walk through the results and build a plan, book a Starter consultation (€149, limited slots per month).

LLM SEO Check Review: Content Relevance Score - What It Does, Costs, and Delivers

What It Does Now (Version 2, post-pivot)

What It Does Well

The Hard Truth About the First Version

So What Is the Score Good For?

Real Weaknesses

Who Should Use It

Who Should Not Use It

Pricing

The Bottom Line

Frequently Asked Questions

Check Your AI Search Readiness

Related Articles

The 26 Factors That Determine Your AI Search Readiness Score

LLM SEO Check Pricing Guide: Free Audit and Starter Consultation

LLM SEO Check vs Conductor: Content Relevance Diagnostic vs Enterprise Monitoring