What drives LLM citation rank — and where the engines diverge
Observational, confound-controlled study: 138,186 source citations, 4 LLMs, 3 company datasets, full page re-audit + query-relevance + external authority. Associational, not causal.
How to read this report
The data. We ran 459 real business questions (e.g. "best accountants for contractors in Leeds", "public liability insurance for electricians") through 4 LLMs — ChatGPT (OpenAI), Claude (Anthropic), Google (Gemini), Perplexity — across 3 client niches (accounting, business insurance, AI tooling). For every answer we recorded which companies each LLM named and in what order, and which web pages it cited as sources — 138,186 citations in all. We then re-audited every cited page (content, structure, schema, speed) and audited random slices of the wider web for comparison. Brand-naming queries were removed, so a company can't win simply by being named in the question.
Plain terms: "rank" = where a company appears in an answer · "top-3" = named in the first three · "cited page" = a URL an LLM used as a source · "visibility" = a 0–100 score for how high a company tends to rank.
Key terms — in plain English
- Odds-ratio (OR)
- How much a page feature multiplies the odds of landing in the top 3. 1.0 = no effect; 1.2 = 20% better odds; 0.8 = 20% worse. Nearly everything here is 0.9–1.1 (small).
- Lift
- The plain version of OR — how many times more likely a page is to support a top-3 company with a feature vs without it.
- AUC
- How well a model predicts top-3 ranking: 0.5 = a coin flip (useless), 1.0 = perfect. Our page-feature models sit at 0.49–0.61 — features barely predict rank.
- Significance (q<.05, ✱)
- The result is unlikely to be chance, after correcting for testing many features at once. More stars = more confidence.
- Content vs boilerplate schema
- Structured-data tags in a page's code. Boilerplate = auto-generated tags every CMS emits (WebSite, Organization). Content = meaningful tags describing the page (FAQ, Article, LocalBusiness, Review).
- Answerability
- Whether a page directly answers the question — definitions, examples, a clear conclusion — rather than just marketing copy.
- Brand / company effect
- The part of ranking explained by which company it is, independent of its pages — a company's stable tendency to be cited after page features are removed.
- Popularity (proxy)
- How many distinct questions a company shows up for at all — a rough, in-dataset stand-in for prominence.
- External authority
- Classic SEO authority — how many other sites link to a domain (Majestic Million rank). An independent signal, unlike "popularity".
- Eligibility vs rank
- Eligibility = being the kind of page that can be cited at all; rank = how high you place once eligible. Most signals affect eligibility, not rank.
- Cohorts
- Cited = pages LLMs cited · sibling = other (uncited) pages on those same sites · Majestic / CC-web = random established / typical web pages, for comparison.
- Cross-engine overlap (Jaccard)
- How much two engines' top-3 lists share: 0 = nothing in common, 1 = identical. Ours ≈ 0.09.
- Consensus core
- The handful of brands that all four engines put in the top 3 for the same question — the thin "head" they agree on despite disagreeing on everything below it.
- Transfer
- If a company is top-3 on one engine, how often it's also top-3 on another. Low transfer ⇒ winning on ChatGPT doesn't get you onto Perplexity.
- Localized vs national query
- Localized = the prompt names a UK city ("…in Manchester"). National = it names no place ("…in the UK").
- National incumbent
- For the localization section: a company cited across ≥3 different cities and among the 15 most-cited such firms (national scale + footprint) — excludes the client itself and single-city local brokers.
- Order of magnitude
- "≈10× or more." When we say the company effect is an order of magnitude larger than page features, one is in the tens-of-percent range and the other in the multiples range.
- Cluster-robust CI / Bayesian HDI
- Two stricter ways to draw the error bars — one corrects for the same company appearing many times, the other re-estimates from scratch. Both agree with the standard result here.
- Correlation (Spearman ρ)
- Strength & direction of a relationship, −1 to +1; 0 = no relationship.
Executive summary
What matters most — 10 stats for ranking on LLMs
- Brand identity is ~everything. A company's stable identity outweighs every page feature by more than an order of magnitude (hierarchical model: company SD ~3 vs feature coefs <0.1) — and that identity is not domain authority, on-page craft, or query-relevance.
- Backlinks / domain authority do NOT move LLM rank. External authority (Majestic) predicts top-3 at AUC ≈ 0.50 (coin flip). You cannot link-build your way to citations.
- Engines diverge on the tail, agree on the head. Only ~9% full top-3 overlap, but the same brand is #1 on ≥2 engines 37% of the time — a thin consensus core of big brands, divergent everywhere below.
- Localization helps locals only in some cities — not as a rule. Naming a city doesn't shrink national-incumbent share on average (17% localized vs 12% national). But it's sharply city-specific: incumbents hold 0% in Glasgow and under 15% in 4 of 6 cities, vs ~27% in Leeds.
- Eligibility = answerability + content schema, not boilerplate. Cited pages are ~3× the typical web on answerability (74% vs CC-web 18%) and content schema (46% vs 19%) — LocalBusiness/FAQ/Article/Review. But boilerplate schema (WebSite/Org) is everywhere (~40–60%) and doesn't discriminate, and none of it moves rank once cited.
- Page speed is a near-non-factor (OR ≈ 1.0; small per-engine differences only).
- Stats beside claims don't help citation here — cited pages had lower stat-density than random web. Contrarian vs Princeton/Citation-Absorption.
- Q&A/FAQ helps a little, on every engine (+1–4%) — neither the +25% nor the −5.7% the literature claims.
- No freshness cliff — citation rate is flat from <1mo to >12mo old content.
- Readability is the one consistent on-page positive — robustly + on OpenAI/Perplexity, − on Anthropic.
- Don't keyword-stuff — title↔body keyword matching is negative on Gemini (0.73×), echoing Princeton.
The playbook grounded in the findings below
Per-engine fingerprints what matters on each
Tilts are small (±~10%) — brand prominence dominates all four — but these are the levers that move the margins once you're in the consideration set.
- Most feature-responsive (AUC 0.61) — on-page optimization actually moves rank here, unlike the others.
- Rewards completeness (1.08×) + readability (robustly positive, CI [1.01,1.09]) + depth/definitions.
- Cites your own pages ~46% of the time, plus how-to/docs — owning strong educational content pays off.
- On-page craft is ~useless (AUC 0.49 — below random; features add noise).
- Leans on third-party review (31%) + comparison (20%) more than any engine — get reviewed/compared, don't just polish your pages.
- Uniquely, readability is slightly negative (don't over-polish); hardest top-3 to crack (42% base rate).
- Punishes keyword over-optimization (0.73× — the single strongest negative in the study). Do not keyword-stuff for Google.
- Likes FAQ (+3.9%, biggest Q&A boost) and depth; schema doesn't help (only engine with lift <1×).
- Behaves most like Claude (ρ=0.51); likely scores its (JS-rendered) search index — may weight rendered content the others never fetch.
- Rewards quotable content (quotations 1.12× — highest of any engine).
- Cites your own pages the most (self 54%) — first-party content pays off here.
- Most divergent from ChatGPT (only 9% top-3 overlap) — wins there transfer least to here.
1 · Per-LLM driver matrix logistic, top-3, FDR-corrected
Takeaway: Read this as "how much does each page feature change the odds of ranking top-3, per engine." Everything sits near 1.0 (±10%) — page features barely move rank. The few real signals: completeness & readability help (esp. ChatGPT); keyword-stuffing hurts (Gemini).
Odds-ratio per +1 SD / presence, controlling for company effect, site, niche. *q<.05 **q<.01 ***q<.001. Effects are small (±10%) — read directions, not dials. Two cells are artifacts, not signals: stat_density's anthropic 2.09× is a heavy-tail outlier (robust lift ≈ 1.0×), and the presence/completeness split is collinearity (r=0.835) — read their combined effect (schema-rich lift ≈ 1.0 = neutral for rank), not the individual signs. Schema is an eligibility signal (§8), not a rank lever.
| feature | openai | anthropic | gemini | perplexity |
|---|---|---|---|---|
| presence_score | 0.94× | 1.04× | 1.01× | 0.96× |
| completeness_score | 1.08×* | 0.99× | 0.97× | 1.06×* |
| llm_readability_score | 1.07×** | 1.02× | 0.99× | 1.07×*** |
| semantics_score | 0.97× | 0.96×* | 1.02× | 0.96×* |
| word_count_log | 1.02× | 0.96×* | 1.03× | 0.96× |
| stat_density | 1.02× | 1.70× | 1.03× | 0.99× |
| external_links_log | 0.97× | 0.99× | 1.02× | 1.02× |
| schema_types_n | 0.95×* | 0.99× | 0.97× | 0.98× |
| psi_score | 1.02× | 1.00× | 1.00× | 0.97× |
| query_coverage_body | 1.03× | 1.11×*** | 1.04× | 1.03× |
| query_coverage_title | 0.96× | 1.04×* | 0.98× | 1.05×* |
| has_definitions | 1.05× | 1.02× | 1.02× | 0.94× |
| has_faq_markup | 0.99× | 0.99× | 1.01× | 1.00× |
| has_keyword_consistency | 0.94× | 1.00× | 0.77×** | 1.09× |
| has_quotation | 1.00× | 1.02× | 1.01× | 1.10×** |
| has_topic_sentences | 0.99× | 0.94× | 1.04× | 1.00× |
| has_examples | 0.98× | 0.94× | 1.02× | 0.92×* |
Model fit: openai AUC 0.601 · anthropic AUC 0.491 · gemini AUC 0.539 · perplexity AUC 0.534 — all near 0.5–0.6 (features weakly predictive).
Lift — the "1.3× more likely" view
2 · Prominence, authority & relevance marginal AUC
Takeaway: Nothing page-level beats simply knowing the company. Adding all page features + query-relevance + backlink authority doesn't predict rank better than the company effect alone — and backlinks alone are a coin flip.
Does anything page-level beat just knowing the company? Cross-validated AUC by signal set:
| engine | popularity only | + audit features | + relevance | all combined |
|---|---|---|---|---|
| openai | 0.606 | 0.604 | 0.603 | 0.601 |
| anthropic | 0.539 | 0.512 | 0.494 | 0.491 |
| gemini | 0.567 | 0.544 | 0.557 | 0.539 |
| perplexity | 0.557 | 0.545 | 0.544 | 0.534 |
On every engine, the company effect alone ≥ everything combined; relevance ORs 1.02–1.10 (negligible).
Is it real authority, or circular? — external check
My "popularity" was an in-dataset appearance count (partly circular). The honest test joins cited domains to Majestic Million (backlink/referring-subnet rank) — a true external authority signal. 77% of cited domains are in the top-1M.
| engine | endogenous "popularity" | EXTERNAL authority (Majestic) | both |
|---|---|---|---|
| openai | 0.607 | 0.506 | 0.608 |
| anthropic | 0.539 | 0.511 | 0.537 |
| gemini | 0.569 | 0.492 | 0.561 |
| perplexity | 0.557 | 0.478 | 0.549 |
"But big brands buy CDNs — wouldn't speed skew this?"
- No raw correlation to absorb. Even without controlling for brand, speed barely moves with rank — Spearman ρ = 0.009 vs top-3. It's not that "brand soaks up the speed signal"; there's no signal to begin with.
- Brands aren't actually faster here. The most-cited "big-brand" pages and the long tail have the same median speed score (PSI 61 vs 61). Speed doesn't track popularity (ρ = 0.004) or backlink authority (ρ = -0.087 — if anything high-authority domains are slightly slower: their content pages are heavier with marketing/analytics tech). A CDN fixes TTFB, which is already near-zero for almost everyone — table stakes that can't differentiate — while PSI/LCP is dominated by page weight, where big brands have no edge.
- LLMs cite slow pages freely. Only 10% of cited pages are genuinely fast (LCP < 2.5s); 74% are slow (LCP > 4s; median 6.6s). If speed were a gate, cited pages would be fast — they're decidedly not.
3 · Cross-LLM divergence
Takeaway: The four engines mostly disagree on the full top-3 (~9% overlap) — but they do share a thin head of recognized brands (consensus core below). Optimise per engine; expect agreement only on the biggest names.
Do drivers differ by engine? (interaction tests)
| feature | LR χ² | p | differs by LLM |
|---|---|---|---|
| presence_score | 15.7 | 0.001 | yes |
| completeness_score | 20.4 | <0.001 | yes |
| semantics_score | 7.0 | 0.071 | no |
| psi_score | 14.7 | 0.002 | yes |
| has_faq_markup | 7.9 | 0.049 | yes |
| has_definitions | 4.8 | 0.186 | no |
Source ecology — what each engine cites (role share)
| source role | opena | anthr | gemin | perpl |
|---|---|---|---|---|
| self | 46% | 39% | — | 55% |
| review | 12% | 23% | — | 19% |
| comparison | 9% | 27% | — | 15% |
| how-to | 8% | 8% | — | 3% |
| docs | 6% | 1% | 11% | 2% |
Self = own page. Perplexity/OpenAI lean on own pages (~half); Anthropic on third-party review/comparison. Only ~9% top-3 overlap between engines.
…but there IS a consensus core — engines agree on the head
"No single AI ranking" is about the full top-3 list. Stratified, the engines do converge on a thin head of recognized brands:
- The same company is rank-1 on ≥2 engines in 37% of queries (≥3 in 9%, all-four in just 1%).
- If a company is top-3 on one engine, it's top-3 on a given other engine only 16% of the time — low transfer beyond the head.
- Consensus core (top-3 on all four across multiple queries): hiscox (9), the hartford (6), simply business (4), an anonymized UK accounting firm (4), mwa accounting (4), flux (3).
Concrete cases where one company was top-3 on all four engines (ranks o/a/g/p):
| query | company | o/a/g/p |
|---|---|---|
| Top insurance for designers in Leeds | policybee | 3/3/2/2 |
| Top 5 business insurance providers for contractors in London | hiscox | 1/3/1/1 |
| Business insurance reviews for landscapers | nationwide | 2/1/2/1 |
| Best alternatives to Hiscox for builders | the hartford | 3/1/1/1 |
| Best Couriers in Cardiff | citysprint | 1/1/2/1 |
| Best insurance for tradespeople in London | simply business | 2/2/1/1 |
| Top insurance for designers in London and why | policybee | 2/1/3/3 |
| What is the best insurance for a courier in Manchester? | admiral | 1/1/1/1 |
So the honest statement: engines share a brand-dominated head (Hiscox, The Hartford, Simply Business…) but diverge on everything below it. [source: rankmatrix.jsonl, 487 multi-engine queries]
4 · The Q&A contradiction, adjudicated
Takeaway: FAQ/Q&A formatting nudges citation up a little on every engine (+1–4%) — neither the +25% nor the −5.7% the published studies claim.
Semrush +25% vs Citation-Absorption −5.74%. Our per-LLM test:
| engine | top-3 w/ FAQ | w/o | relative diff |
|---|---|---|---|
| openai | 58% | 56% | +3.36% |
| anthropic | 42% | 42% | +0.94% |
| gemini | 48% | 46% | +3.58% |
| perplexity | 47% | 45% | +3.8% |
Modestly positive on all four (+1–4%) — does not flip sign; far smaller than either camp claims.
5 · Company breadth
Takeaway: More pages looks like it helps — but only because prominent brands happen to have more pages. Control for prominence and the effect reverses. Publishing more pages is not a lever.
Raw ρ = 0.08; controlling for the company effect, partial ρ = -0.36 — apparent breadth benefit is a prominence proxy.
| distinct cited pages | mean visibility | median best rank | companies |
|---|---|---|---|
| 2 | 34.7 | 7 | 18 |
| 3 | 30.2 | 6 | 17 |
| 4-5 | 48.6 | 4 | 122 |
| 6-10 | 52.0 | 4 | 2771 |
| 11+ | 53.2 | 2 | 2018 |
6 · Per-niche variation
Takeaway: The picture barely changes by niche — schema helps trades a touch more than accounting/crypto, but brand prominence dominates in every vertical.
Top-3 rate + on-page lift by niche. Variation is modest; schema helps trades (Landscapers/Carpenters/Builders ~1.08–1.09×) slightly more than accounting/crypto. The brand-dominated story holds everywhere.
| niche | top-3 rate | schema-rich lift | answerable lift | n |
|---|---|---|---|---|
| Accountant for Therapists | 63% | 1.04× | 1.05× | 1671 |
| Accountant for Contractors | 60% | 1.06× | 1.03× | 1750 |
| Accountant for Crypto Returns | 59% | 0.96× | 1.05× | 1659 |
| Consultants | 57% | 1.07× | 1.02× | 4390 |
| Electricians | 57% | 1.04× | 1.02× | 4895 |
| Accountant for Lawyers | 57% | 1.00× | 1.04× | 2154 |
| Builders | 57% | 1.00× | 1.02× | 4716 |
| Accountant for Crypto | 56% | 0.91× | 0.99× | 2086 |
| Accountant for Designers | 56% | 0.95× | 1.04× | 2109 |
| Accountant for Ecommerce | 55% | 0.98× | 0.99× | 6443 |
| Decorators | 55% | 1.01× | 1.03× | 4829 |
| Landscapers | 55% | 1.09× | 1.04× | 5156 |
| Crypto Tax Returns | 55% | 1.03× | 1.02× | 2001 |
| Accountant for Recruiting agencies | 54% | 0.96× | 0.98× | 1988 |
| Carpenters | 54% | 1.08× | 0.99× | 4770 |
| Designers | 54% | 0.98× | 1.03× | 4659 |
| Accountant For Contractors | 53% | 1.03× | 1.00× | 2075 |
| Accountant for Doctors | 53% | 0.96× | 0.98× | 4688 |
| Contractors | 53% | 1.07× | 1.09× | 5111 |
| Accountant for Landlords | 53% | 1.01× | 0.99× | 5852 |
| Couriers | 53% | 1.05× | 1.02× | 5170 |
| Contractor Accountant | 52% | 0.94× | 1.00× | 1906 |
| Accountant for Dentist | 51% | 0.98× | 1.07× | 2171 |
| Tradespeople | 51% | 1.04× | 1.03× | 5336 |
| Self employed accountant | 50% | 1.01× | 0.90× | 1989 |
| Limited Company Accounting | 47% | 0.99× | 0.95× | 2591 |
7 · Per-location variation & localization
Top-3 rate by the city named in the prompt (variation tracks competitive density — bigger metros = more competitors = lower top-3 rate). Cities are resolved from the prompt text, not the stored location tag (see the data-quality note below for why).
| prompt names… | top-3 rate | schema-rich lift | cited-domain authority | citations (n) |
|---|---|---|---|---|
| Glasgow | 59% | 0.98× | 0.75 | 3,926 |
| Birmingham | 51% | 1.02× | 0.83 | 12,728 |
| Manchester | 48% | 0.99× | 0.88 | 19,062 |
| Cardiff | 48% | 1.04× | 0.80 | 11,635 |
| London | 42% | 1.03× | 0.84 | 30,076 |
| Leeds | 41% | 0.95× | 0.80 | 24,218 |
| National (no city) | 55% | 1.04× | 1.03 | 24,627 |
Coverage — not every prompt names a place. 81% of insurance + accounting citations come from prompts that name a UK city (101,645); the other 19% (24,627) are national, shown as the italic National (no city) baseline. So the city rows are not the whole dataset — they're the located 81%. Schema-rich lift ≈ 1.0× in every location ⇒ on-page schema doesn't change top-3 odds anywhere. The PCB/electronics dataset is excluded entirely (no geographic prompts).
Localization — do local players beat the big national brands?
Takeaway (the hypothesis was only half right). Naming a city does not uniformly hand the top 3 to small players — across all localized prompts the national incumbents actually hold a slightly higher share than in national prompts (17% vs 12%). The real effect is city-specific: in 4 of 6 cities (Glasgow, Birmingham, Manchester and Cardiff) the incumbents nearly vanish — holding under 15% of the top 3 — while in London and Leeds they hold ~22–27%. Where local players win, they win decisively: each low-incumbent city has hundreds of distinct local/regional firms taking those slots.
| UK city named in prompt | incumbent share of top-3 | distinct local players in top-3 | top-3 slots (n) |
|---|---|---|---|
| Glasgow | 0% | 174 | 340 |
| Birmingham | 10% | 412 | 822 |
| Manchester | 13% | 529 | 1095 |
| Cardiff | 13% | 308 | 625 |
| London | 22% | 521 | 1540 |
| Leeds | 27% | 318 | 1024 |
The gradient is the finding: Glasgow → 0% incumbent share (local brokers own it outright), rising monotonically to Leeds 27%. The national-average comparison washes this out because London + Leeds carry most of the volume. The 15 incumbents under test: hiscox, simply business, axa, direct line for business, the hartford, axa uk, markel direct, aviva…
location_slug is unreliable as a per-prompt geo label: only 48.7% of rows agree with the city named in the prompt, 17.7% point to the wrong city (a mis-aligned production backfill — e.g. a Cardiff tag on a Leeds prompt), 18.3% carry a tag the prompt never names, and 4.9% name a city but were left untagged. The same city was also stored under two forms (geoname-2643743 and london-uk), splitting it in two. We therefore derive the city from the prompt text and merge the duplicates — every figure above is slug-independent. (An earlier draft of this section, built on the raw tags, reported a false "incumbents ~0% in cities" effect; that was the bug.)[source: rankings.json — de-branded, test/probe prompts removed — × query-text city × top-15 multi-city incumbent set. Insurance + accounting sites only (5,446 localized + 1,699 national top-3 slots); the PCB/electronics dataset has no geographic prompts.]
Who wins locations, per engine
National-incumbent share of the top-3, by engine × city (cities ordered most→least local-friendly). Green = local players own it; red = the established national brands hold on.
| engine | Glasgow | Birmingham | Manchester | Cardiff | London | Leeds | all localized |
|---|---|---|---|---|---|---|---|
| perplexity | 0% | 6% | 10% | 12% | 19% | 18% | 13% |
| openai | 0% | 13% | 11% | 18% | 18% | 32% | 17% |
| gemini | 0% | 10% | 14% | 7% | 25% | 27% | 18% |
| anthropic | 0% | 11% | 18% | 15% | 26% | 33% | 20% |
perplexity is the most open to local/niche players (incumbents only 13% of localized top-3); anthropic leans hardest on the established national brands (20%). But the gradient itself is identical on all four engines: Glasgow 0% everywhere, climbing to ~25–33% in London & Leeds.
Your thesis: is niche / regional easier to rank than global?
- Smaller regional markets are wide open. In secondary cities the established national insurers nearly disappear from the top-3 — 0% Glasgow, 10% Birmingham, 13% Manchester, 13% Cardiff — vs ~22–27% in the biggest, most-contested markets (London, Leeds). The smaller and more specific the market, the lower the incumbent wall — and it holds on all four engines.
- Traditional SEO authority does NOT gate LLM citation. A domain's backlink authority predicts top-3 at a coin flip (AUC open 0.51 · anth 0.51 · gemi 0.49 · perp 0.48 — all ≈0.50, §2). You don't have to out-rank decades-old brands in classic SEO to get cited; the "years of link-building" moat simply doesn't transfer to LLM answers.
- The global "head" is where the old brands live. The only place the engines agree (the cross-engine consensus core, §3) is a thin head of household names — Hiscox, The Hartford, Simply Business. Broad/global prompts surface that head; niche + regional prompts surface a long, fragmented tail (2,262+ distinct local players across the six cities) that a new entrant can break into.
Honest caveat: this is one point in time, so it shows the niche/regional opportunity (a low incumbent wall, no SEO-authority gate) but cannot measure ranking speed. "We rank in days; SEO takes months" is fully consistent with this data but isn't proven by it — that needs a time series. And the raw localized-vs-national averages are confounded by vertical (insurance vs accounting), which is why the headline is the per-city gradient, which isn't.
8 · The eligibility floor — cited vs wild
Takeaway: To be citable at all, be answer-shaped (definitions, examples — 74% of cited pages vs ~18% of the web) and carry real content schema (FAQ/Article/LocalBusiness). Boilerplate schema is everywhere and doesn't distinguish you. This earns eligibility, not rank.
% with each signal across four cohorts: cited pages · uncited siblings (same domains) · Majestic top-1M draw · a uniform Common Crawl web sample (n=106 — the true long tail Majestic excludes). Key read: the two random baselines agree, and schema is widespread (~half the live web has it — CMSs auto-emit), so schema presence is a weak discriminator. The real ~3× gap is answerability/depth (answer-shaped content, definitions, examples, length): cited 74% answerable vs CC-web 18%. So eligibility = substantive answer-shaped content + real content schema, not boilerplate markup. Note the schema decomposition below: boilerplate (WebSite/Org/Breadcrumb) is everywhere (~40–60% across all cohorts, weak signal); content schema (LocalBusiness/FAQ/Article/Review) is the ~3× differentiator (cited 46% vs web 19%–18%). Neither moves rank once you're cited.
| % present | cited | sibling | Majestic 1M | CC web |
|---|---|---|---|---|
| has_any_schema | 68% | 64% | 43% | 52% |
| boilerplate_schema | 57% | 61% | 38% | 41% |
| content_schema | 46% | 39% | 18% | 19% |
| has_definitions | 71% | 64% | 27% | 18% |
| has_examples | 57% | 47% | 17% | 11% |
| has_conclusion | 16% | 12% | 3% | 4% |
| has_topic_sentences | 72% | 68% | 60% | 59% |
| has_faq_markup | 20% | 15% | 6% | 0% |
| answerability | 74% | 65% | 31% | 18% |
| has_aggregate_rating | 7% | 8% | 2% | 1% |
| has_author_schema | 13% | 23% | 2% | 1% |
| has_quotation | 24% | 34% | 20% | 18% |
Medians
| median | cited | sibling | Majestic 1M | CC web |
|---|---|---|---|---|
| word_count | 1266.00 | 1437.00 | 726.50 | 731.50 |
| schema_count | 7.00 | 5.00 | 0.00 | 1.00 |
| schema_types_n | 2.00 | 2.00 | 0.00 | 0.00 |
| stat_density | 25.46 | 31.24 | 46.38 | 66.90 |
| presence_score | 25.00 | 25.00 | 0.00 | 25.00 |
| completeness_score | 21.00 | 18.00 | 0.00 | 0.00 |
| psi_score | 61.00 | — | — | — |
CC web n=106 (uniform Common Crawl draw, the genuine long tail) — rates ±~10%. It tracks Majestic on schema (≈ half the web) but confirms the big answerability/depth gap, validating the eligibility-floor finding against a true-web baseline.
9 · Recency
| content age | top-3 support rate | n |
|---|---|---|
| <1mo | 45% | 34125 |
| 1-3mo | 46% | 11394 |
| 3-6mo | 46% | 10322 |
| 6-12mo | 46% | 8216 |
| >12mo | 50% | 14131 |
Flat — no "3-month freshness cliff" in our data.
10 · This study vs prior work
| finding | prior work | our result | verdict |
|---|---|---|---|
| Backlinks / domain authority | Semrush: "weak". Classic SEO: strong. | AUC ≈ 0.50 (no effect) | agree w/ Semrush, refute classic SEO |
| Cross-engine overlap | ~11% (ChatGPT∩Perplexity) | ~9% mean pairwise | agree |
| Q&A / FAQ formatting | Semrush +25% / Citation-Absorption −5.7% | +1 to +4% (all engines) | splits the difference; small |
| Statistics in content | Princeton +30–40% (causal) | no association (cited < random) | diverge (theirs causal, ours observational) |
| Semantic fit > length | Citation-Absorption: yes | both weak; relevance AUC ≈ 0 | partial — both small here |
| Freshness / recency cliff | ~3-month cliff | flat across age | refute |
| Keyword stuffing | Princeton: negative | negative on Gemini (0.73×) | agree |
Princeton GEO is causal (edit one page, re-query); ours is observational across a brand-dominated competitive field — both can hold. The headline divergence: authority/backlinks and stats are levers in the SEO/GEO literature but non-levers for LLM citation here.
11 · Statistical robustness — Bayesian vs frequentist cluster-robust + MCMC
Takeaway: We re-checked the findings three ways (standard, cluster-robust, and Bayesian) — they all agree. The conclusions don't depend on the statistical method; the company effect dwarfs every page feature in all of them.
schema presence
| engine | freq OR | naive 95% CI | cluster-robust CI | Bayes OR | 94% HDI | P(OR>1) |
|---|---|---|---|---|---|---|
| openai | 0.94 | [0.897, 0.985] | [0.882, 1.002] | 0.94 | [0.899, 0.983] | 0.009 |
| anthropic | 1.021 | [0.987, 1.057] | [0.973, 1.072] | 1.021 | [0.988, 1.054] | 0.881 |
| gemini | 1.015 | [0.967, 1.066] | [0.956, 1.078] | 1.016 | [0.972, 1.062] | 0.736 |
| perplexity | 0.992 | [0.955, 1.031] | [0.944, 1.042] | 0.993 | [0.958, 1.027] | 0.349 |
readability
| engine | freq OR | naive 95% CI | cluster-robust CI | Bayes OR | 94% HDI | P(OR>1) |
|---|---|---|---|---|---|---|
| openai | 1.051 | [1.022, 1.082] | [1.01, 1.094] | 1.051 | [1.022, 1.079] | 1 |
| anthropic | 0.951 | [0.93, 0.971] | [0.924, 0.978] | 0.951 | [0.931, 0.97] | 0 |
| gemini | 0.957 | [0.93, 0.986] | [0.919, 0.997] | 0.957 | [0.931, 0.983] | 0.001 |
| perplexity | 1.019 | [0.994, 1.044] | [0.983, 1.057] | 1.019 | [0.996, 1.043] | 0.93 |
Point estimates agree (Bayes ≈ frequentist at this n); cluster-robust CIs (clustered by company) are wider — the honest non-independence fix. Bayesian fixed-effects on full per-engine data; hierarchical on a 12,000-row subsample (rhat>1.01 — magnitude approximate, conclusion robust).
12 · Method & limits
Models. Per-LLM logistic (top-3 ~ standardized features + company effect + site/niche FE), self-citations excluded; FDR across the feature×engine grid; bootstrap lift CIs; 5-fold CV AUC; external authority via Majestic Million join; Bayesian + cluster-robust robustness (§11).
Limits. Selection / survivorship: the cohort is companies LLMs already cite — we never observe niche players that were never mentioned, so the study explains rank among the considered set, not how to enter it from zero. "Eligibility" (cited-vs-wild) is therefore a crude proxy — random web isn't "uncited competitors"; the cleanest within-niche control is the sibling arm (same domains, uncited), where gaps are small. Observational (associational, not causal — distinct from Princeton's intervention); the dominant per-company effect is unexplained by measured signals (likely brand salience, unmeasured); the endogenous "popularity" proxy is partly circular (hence the external Majestic check); single snapshot; wild-random skewed to popular sites (conservative); PSI-lab is the speed signal; source-role Haiku-classified with gaps (Gemini); hierarchical MCMC on a subsample with imperfect convergence.