# Data Dictionary ## dataset_domain.csv Domain-level aggregated dataset. One row per unique domain. | Column | Type | Description | |--------|------|-------------| | domain | str | Registrable domain (e.g. `example.com`) | | citation_rate | float | Fraction of queries where domain was cited (0.0–1.0) | | queries_cited | int | Number of queries where domain was cited (majority vote) | | total_queries | int | Total number of queries in the study | | mean_consistency | float | Mean citation consistency across queries (0.0–1.0) | | mean_google_rank | float | Mean best Google rank across queries (NaN if not ranked) | | best_google_rank | float | Best (minimum) Google rank across all queries | | total_score | float | AI Search Readiness Score (0–100) | | machine_readability | float | MR sub-score (0–25) | | extractability | float | EX sub-score (0–30) | | trust_entity | float | TE sub-score (0–25) | | offering_readiness | float | OR sub-score (0–20) | | moz_da | float | Moz Domain Authority (0–100) | | domain_age_years | float | Domain age in years (from WHOIS) | | wikipedia_presence | int | 1 if domain has a Wikipedia article, 0 otherwise | ## dataset_domain_query.csv Domain × query level dataset. One row per (domain, query) pair. | Column | Type | Description | |--------|------|-------------| | domain | str | Registrable domain | | query_id | int | Query identifier (1–30) | | topic_cluster | str | Query vertical: `saas`, `ecom`, or `services` | | times_cited | int | Number of replicates that cited this domain for this query | | total_replicates | int | Total replicates per query (3) | | citation_consistency | float | Fraction of replicates citing the domain (0.0–1.0) | | is_cited | int | 1 if majority of replicates cited the domain, 0 otherwise | | google_rank | float | Best Google rank for this domain on this query (NaN if unranked) | | total_score | float | AI Search Readiness Score (0–100) | | machine_readability | float | MR sub-score (0–25) | | extractability | float | EX sub-score (0–30) | | trust_entity | float | TE sub-score (0–25) | | offering_readiness | float | OR sub-score (0–20) | | moz_da | float | Moz Domain Authority (0–100) | | domain_age_years | float | Domain age in years | | wikipedia_presence | int | Wikipedia presence flag | ## Notes - **is_cited** uses majority vote: cited in ≥50% of replicates → 1 - **citation_rate** at domain level = mean(is_cited) across all 30 queries - **total_score** = machine_readability + extractability + trust_entity + offering_readiness - Score formula weights are frozen per pre-registration (MR 25%, EX 30%, TE 25%, OR 20%) - Domains appear in the dataset if they were cited by any LLM response OR appeared in any search result