What drives LLM citation rank — and where the engines diverge

Observational, confound-controlled study: 138,186 source citations, 4 LLMs, 3 company datasets, full page re-audit + query-relevance + external authority. Associational, not causal.

The one-line finding. LLM citation rank is driven by a large, stable per-company effect — an order of magnitude+ larger than any page feature — that is not explained by domain authority (backlinks predict at a coin-flip), on-page craft, or query-relevance — most consistent with brand familiarity in the model's training. Page structure/answerability earns a place in the eligible set, not rank. You can't optimize or link-build your way in — you earn it as a recognized brand.
What moves LLM visibility
>10×Being a recognised brand — a company's identity outweighs every on-page change by an order of magnitude.
74% / 18%Answer-shaped content — cited pages directly answer the question (definitions, examples) far more than a typical web page.
~9%Per-engine tailoring — the four engines pick the same top-3 only ~9% of the time; optimise per engine, not "for AI".
What doesn't — the GEO myths
AUC 0.50Backlinks / domain authority — predicts citation no better than a coin flip.
≈1.0×Page speed — essentially no effect on whether you're cited.
≈ noneStats, freshness, boilerplate schema, Q&A-stuffing — no measurable citation lift.

How to read this report

The data. We ran 459 real business questions (e.g. "best accountants for contractors in Leeds", "public liability insurance for electricians") through 4 LLMs — ChatGPT (OpenAI), Claude (Anthropic), Google (Gemini), Perplexity — across 3 client niches (accounting, business insurance, AI tooling). For every answer we recorded which companies each LLM named and in what order, and which web pages it cited as sources138,186 citations in all. We then re-audited every cited page (content, structure, schema, speed) and audited random slices of the wider web for comparison. Brand-naming queries were removed, so a company can't win simply by being named in the question.

Plain terms: "rank" = where a company appears in an answer · "top-3" = named in the first three · "cited page" = a URL an LLM used as a source · "visibility" = a 0–100 score for how high a company tends to rank.

Key terms — in plain English

Odds-ratio (OR)
How much a page feature multiplies the odds of landing in the top 3. 1.0 = no effect; 1.2 = 20% better odds; 0.8 = 20% worse. Nearly everything here is 0.9–1.1 (small).
Lift
The plain version of OR — how many times more likely a page is to support a top-3 company with a feature vs without it.
AUC
How well a model predicts top-3 ranking: 0.5 = a coin flip (useless), 1.0 = perfect. Our page-feature models sit at 0.49–0.61 — features barely predict rank.
Significance (q<.05, ✱)
The result is unlikely to be chance, after correcting for testing many features at once. More stars = more confidence.
Content vs boilerplate schema
Structured-data tags in a page's code. Boilerplate = auto-generated tags every CMS emits (WebSite, Organization). Content = meaningful tags describing the page (FAQ, Article, LocalBusiness, Review).
Answerability
Whether a page directly answers the question — definitions, examples, a clear conclusion — rather than just marketing copy.
Brand / company effect
The part of ranking explained by which company it is, independent of its pages — a company's stable tendency to be cited after page features are removed.
Popularity (proxy)
How many distinct questions a company shows up for at all — a rough, in-dataset stand-in for prominence.
External authority
Classic SEO authority — how many other sites link to a domain (Majestic Million rank). An independent signal, unlike "popularity".
Eligibility vs rank
Eligibility = being the kind of page that can be cited at all; rank = how high you place once eligible. Most signals affect eligibility, not rank.
Cohorts
Cited = pages LLMs cited · sibling = other (uncited) pages on those same sites · Majestic / CC-web = random established / typical web pages, for comparison.
Cross-engine overlap (Jaccard)
How much two engines' top-3 lists share: 0 = nothing in common, 1 = identical. Ours ≈ 0.09.
Consensus core
The handful of brands that all four engines put in the top 3 for the same question — the thin "head" they agree on despite disagreeing on everything below it.
Transfer
If a company is top-3 on one engine, how often it's also top-3 on another. Low transfer ⇒ winning on ChatGPT doesn't get you onto Perplexity.
Localized vs national query
Localized = the prompt names a UK city ("…in Manchester"). National = it names no place ("…in the UK").
National incumbent
For the localization section: a company cited across ≥3 different cities and among the 15 most-cited such firms (national scale + footprint) — excludes the client itself and single-city local brokers.
Order of magnitude
"≈10× or more." When we say the company effect is an order of magnitude larger than page features, one is in the tens-of-percent range and the other in the multiples range.
Cluster-robust CI / Bayesian HDI
Two stricter ways to draw the error bars — one corrects for the same company appearing many times, the other re-estimates from scratch. Both agree with the standard result here.
Correlation (Spearman ρ)
Strength & direction of a relationship, −1 to +1; 0 = no relationship.

Executive summary

138,186
citations modeled
4,961
companies
0.51
authority→rank AUC (≈ coin flip)
~9%
cross-engine top-3 overlap

What matters most — 10 stats for ranking on LLMs

  1. Brand identity is ~everything. A company's stable identity outweighs every page feature by more than an order of magnitude (hierarchical model: company SD ~3 vs feature coefs <0.1) — and that identity is not domain authority, on-page craft, or query-relevance.
  2. Backlinks / domain authority do NOT move LLM rank. External authority (Majestic) predicts top-3 at AUC ≈ 0.50 (coin flip). You cannot link-build your way to citations.
  3. Engines diverge on the tail, agree on the head. Only ~9% full top-3 overlap, but the same brand is #1 on ≥2 engines 37% of the time — a thin consensus core of big brands, divergent everywhere below.
  4. Localization helps locals only in some cities — not as a rule. Naming a city doesn't shrink national-incumbent share on average (17% localized vs 12% national). But it's sharply city-specific: incumbents hold 0% in Glasgow and under 15% in 4 of 6 cities, vs ~27% in Leeds.
  5. Eligibility = answerability + content schema, not boilerplate. Cited pages are ~3× the typical web on answerability (74% vs CC-web 18%) and content schema (46% vs 19%) — LocalBusiness/FAQ/Article/Review. But boilerplate schema (WebSite/Org) is everywhere (~40–60%) and doesn't discriminate, and none of it moves rank once cited.
  6. Page speed is a near-non-factor (OR ≈ 1.0; small per-engine differences only).
  7. Stats beside claims don't help citation here — cited pages had lower stat-density than random web. Contrarian vs Princeton/Citation-Absorption.
  8. Q&A/FAQ helps a little, on every engine (+1–4%) — neither the +25% nor the −5.7% the literature claims.
  9. No freshness cliff — citation rate is flat from <1mo to >12mo old content.
  10. Readability is the one consistent on-page positive — robustly + on OpenAI/Perplexity, − on Anthropic.
  11. Don't keyword-stuff — title↔body keyword matching is negative on Gemini (0.73×), echoing Princeton.

The playbook grounded in the findings below

Publishing cadence coverage > frequency
Evidence: Citation rate is flat across content age (no freshness cliff); page features add ~0 AUC over the company effect.
Don't run a treadmill for "freshness." Publish to cover query intents — one strong answer-shaped page per intent — then stop. Cadence tracks new intents, not a weekly quota.
Internal build vs. external salience, not link-building
Evidence: Prominence dominates rank, but external domain authority (backlinks) predicts citation at a coin-flip (AUC ≈ 0.50). Self-citation share: Perplexity 55%, OpenAI 46%, Anthropic 39% (leans on third-party review 23%).
Build one excellent internal page per intent (eligibility floor). For "external," chase brand mentions & category presence that build salience — being named/reviewed/compared broadly — not link-building for domain authority, which has ~zero effect on LLM citation. Anthropic especially rewards being reviewed/compared on third-party sites.
How to structure language answer-first, definitional, not keyword-stuffed
Evidence: Cited pages are answer-shaped far more than random: answerability 74% vs 31%, definitions 71% vs 27%. Readability is the only consistent positive; keyword over-optimization is negative on Gemini.
Lead with a direct definitional answer, then concrete examples, clear topic sentences, a short conclusion. Mirror query language naturally — do not keyword-stuff. Completeness over length.
Stats beside each claim credibility, not a rank lever
Evidence: Stat density did not predict citation (cited 25.46 < random 46.38 per 1k words; lift ~1.0×). Contradicts Princeton's causal +30–40%.
Add stats for human trust — but in a brand-dominated field they are not a citation lever. Spend the effort on coverage + brand salience instead.

Per-engine fingerprints what matters on each

Tilts are small (±~10%) — brand prominence dominates all four — but these are the levers that move the margins once you're in the consideration set.

ChatGPT (OpenAI) — the one engine where craft pays
  • Most feature-responsive (AUC 0.61) — on-page optimization actually moves rank here, unlike the others.
  • Rewards completeness (1.08×) + readability (robustly positive, CI [1.01,1.09]) + depth/definitions.
  • Cites your own pages ~46% of the time, plus how-to/docs — owning strong educational content pays off.
Claude (Anthropic) — third-party-driven
  • On-page craft is ~useless (AUC 0.49 — below random; features add noise).
  • Leans on third-party review (31%) + comparison (20%) more than any engine — get reviewed/compared, don't just polish your pages.
  • Uniquely, readability is slightly negative (don't over-polish); hardest top-3 to crack (42% base rate).
Google (Gemini) — the anti-spam engine
  • Punishes keyword over-optimization (0.73× — the single strongest negative in the study). Do not keyword-stuff for Google.
  • Likes FAQ (+3.9%, biggest Q&A boost) and depth; schema doesn't help (only engine with lift <1×).
  • Behaves most like Claude (ρ=0.51); likely scores its (JS-rendered) search index — may weight rendered content the others never fetch.
Perplexity — the quote/citation engine
  • Rewards quotable content (quotations 1.12× — highest of any engine).
  • Cites your own pages the most (self 54%) — first-party content pays off here.
  • Most divergent from ChatGPT (only 9% top-3 overlap) — wins there transfer least to here.

1 · Per-LLM driver matrix logistic, top-3, FDR-corrected

Takeaway: Read this as "how much does each page feature change the odds of ranking top-3, per engine." Everything sits near 1.0 (±10%) — page features barely move rank. The few real signals: completeness & readability help (esp. ChatGPT); keyword-stuffing hurts (Gemini).

How to read the table below. Each row is a page feature; each column is an engine. Each cell is an odds-ratio: 1.00 = that feature makes no difference to landing top-3; 1.10 = 10% better odds (green), 0.90 = 10% worse (red). Stars mark statistical confidence (more stars = less likely to be chance). The takeaway you're looking for: almost every cell is between 0.9 and 1.1 — page features barely move the needle. Read the direction of the strong ones, not the exact decimal.

Odds-ratio per +1 SD / presence, controlling for company effect, site, niche. *q<.05 **q<.01 ***q<.001. Effects are small (±10%) — read directions, not dials. Two cells are artifacts, not signals: stat_density's anthropic 2.09× is a heavy-tail outlier (robust lift ≈ 1.0×), and the presence/completeness split is collinearity (r=0.835) — read their combined effect (schema-rich lift ≈ 1.0 = neutral for rank), not the individual signs. Schema is an eligibility signal (§8), not a rank lever.

featureopenaianthropicgeminiperplexity
presence_score0.94×1.04×1.01×0.96×
completeness_score1.08×*0.99×0.97×1.06×*
llm_readability_score1.07×**1.02×0.99×1.07×***
semantics_score0.97×0.96×*1.02×0.96×*
word_count_log1.02×0.96×*1.03×0.96×
stat_density1.02×1.70×1.03×0.99×
external_links_log0.97×0.99×1.02×1.02×
schema_types_n0.95×*0.99×0.97×0.98×
psi_score1.02×1.00×1.00×0.97×
query_coverage_body1.03×1.11×***1.04×1.03×
query_coverage_title0.96×1.04×*0.98×1.05×*
has_definitions1.05×1.02×1.02×0.94×
has_faq_markup0.99×0.99×1.01×1.00×
has_keyword_consistency0.94×1.00×0.77×**1.09×
has_quotation1.00×1.02×1.01×1.10×**
has_topic_sentences0.99×0.94×1.04×1.00×
has_examples0.98×0.94×1.02×0.92×*

Model fit: openai AUC 0.601 · anthropic AUC 0.491 · gemini AUC 0.539 · perplexity AUC 0.534 — all near 0.5–0.6 (features weakly predictive).

Lift — the "1.3× more likely" view

2 · Prominence, authority & relevance marginal AUC

Takeaway: Nothing page-level beats simply knowing the company. Adding all page features + query-relevance + backlink authority doesn't predict rank better than the company effect alone — and backlinks alone are a coin flip.

Does anything page-level beat just knowing the company? Cross-validated AUC by signal set:

enginepopularity only+ audit features+ relevanceall combined
openai0.6060.6040.6030.601
anthropic0.5390.5120.4940.491
gemini0.5670.5440.5570.539
perplexity0.5570.5450.5440.534

On every engine, the company effect alone ≥ everything combined; relevance ORs 1.02–1.10 (negligible).

Is it real authority, or circular? — external check

My "popularity" was an in-dataset appearance count (partly circular). The honest test joins cited domains to Majestic Million (backlink/referring-subnet rank) — a true external authority signal. 77% of cited domains are in the top-1M.

engineendogenous "popularity"EXTERNAL authority (Majestic)both
openai0.6070.5060.608
anthropic0.5390.5110.537
gemini0.5690.4920.561
perplexity0.5570.4780.549
External domain authority does NOT predict LLM citation rank. AUC ≈ 0.51–0.49 (a coin flip), Spearman ≈ 0.000. And the endogenous "popularity" is uncorrelated with external authority (ρ=-0.071). So the dominant per-company effect is not backlinks, not on-page, not relevance — most consistent with brand familiarity in the model's training. You can't link-build your way to LLM citations.

3 · Cross-LLM divergence

Takeaway: The four engines mostly disagree on the full top-3 (~9% overlap) — but they do share a thin head of recognized brands (consensus core below). Optimise per engine; expect agreement only on the biggest names.

Do drivers differ by engine? (interaction tests)

featureLR χ²pdiffers by LLM
presence_score15.70.001yes
completeness_score20.4<0.001yes
semantics_score7.00.071no
psi_score14.70.002yes
has_faq_markup7.90.049yes
has_definitions4.80.186no

Source ecology — what each engine cites (role share)

source roleopenaanthrgeminperpl
self46%39%55%
review12%23%19%
comparison9%27%15%
how-to8%8%3%
docs6%1%11%2%

Self = own page. Perplexity/OpenAI lean on own pages (~half); Anthropic on third-party review/comparison. Only ~9% top-3 overlap between engines.

…but there IS a consensus core — engines agree on the head

"No single AI ranking" is about the full top-3 list. Stratified, the engines do converge on a thin head of recognized brands:

Concrete cases where one company was top-3 on all four engines (ranks o/a/g/p):

querycompanyo/a/g/p
Top insurance for designers in Leedspolicybee3/3/2/2
Top 5 business insurance providers for contractors in Londonhiscox1/3/1/1
Business insurance reviews for landscapersnationwide2/1/2/1
Best alternatives to Hiscox for buildersthe hartford3/1/1/1
Best Couriers in Cardiffcitysprint1/1/2/1
Best insurance for tradespeople in Londonsimply business2/2/1/1
Top insurance for designers in London and whypolicybee2/1/3/3
What is the best insurance for a courier in Manchester?admiral1/1/1/1

So the honest statement: engines share a brand-dominated head (Hiscox, The Hartford, Simply Business…) but diverge on everything below it. [source: rankmatrix.jsonl, 487 multi-engine queries]

4 · The Q&A contradiction, adjudicated

Takeaway: FAQ/Q&A formatting nudges citation up a little on every engine (+1–4%) — neither the +25% nor the −5.7% the published studies claim.

Semrush +25% vs Citation-Absorption −5.74%. Our per-LLM test:

enginetop-3 w/ FAQw/orelative diff
openai58%56%+3.36%
anthropic42%42%+0.94%
gemini48%46%+3.58%
perplexity47%45%+3.8%

Modestly positive on all four (+1–4%) — does not flip sign; far smaller than either camp claims.

5 · Company breadth

Takeaway: More pages looks like it helps — but only because prominent brands happen to have more pages. Control for prominence and the effect reverses. Publishing more pages is not a lever.

Raw ρ = 0.08; controlling for the company effect, partial ρ = -0.36 — apparent breadth benefit is a prominence proxy.

distinct cited pagesmean visibilitymedian best rankcompanies
234.7718
330.2617
4-548.64122
6-1052.042771
11+53.222018

6 · Per-niche variation

Takeaway: The picture barely changes by niche — schema helps trades a touch more than accounting/crypto, but brand prominence dominates in every vertical.

Top-3 rate + on-page lift by niche. Variation is modest; schema helps trades (Landscapers/Carpenters/Builders ~1.08–1.09×) slightly more than accounting/crypto. The brand-dominated story holds everywhere.

nichetop-3 rateschema-rich liftanswerable liftn
Accountant for Therapists63%1.04×1.05×1671
Accountant for Contractors60%1.06×1.03×1750
Accountant for Crypto Returns59%0.96×1.05×1659
Consultants57%1.07×1.02×4390
Electricians57%1.04×1.02×4895
Accountant for Lawyers57%1.00×1.04×2154
Builders57%1.00×1.02×4716
Accountant for Crypto56%0.91×0.99×2086
Accountant for Designers56%0.95×1.04×2109
Accountant for Ecommerce55%0.98×0.99×6443
Decorators55%1.01×1.03×4829
Landscapers55%1.09×1.04×5156
Crypto Tax Returns55%1.03×1.02×2001
Accountant for Recruiting agencies54%0.96×0.98×1988
Carpenters54%1.08×0.99×4770
Designers54%0.98×1.03×4659
Accountant For Contractors53%1.03×1.00×2075
Accountant for Doctors53%0.96×0.98×4688
Contractors53%1.07×1.09×5111
Accountant for Landlords53%1.01×0.99×5852
Couriers53%1.05×1.02×5170
Contractor Accountant52%0.94×1.00×1906
Accountant for Dentist51%0.98×1.07×2171
Tradespeople51%1.04×1.03×5336
Self employed accountant50%1.01×0.90×1989
Limited Company Accounting47%0.99×0.95×2591

7 · Per-location variation & localization

Top-3 rate by the city named in the prompt (variation tracks competitive density — bigger metros = more competitors = lower top-3 rate). Cities are resolved from the prompt text, not the stored location tag (see the data-quality note below for why).

prompt names…top-3 rateschema-rich liftcited-domain authoritycitations (n)
Glasgow59%0.98×0.753,926
Birmingham51%1.02×0.8312,728
Manchester48%0.99×0.8819,062
Cardiff48%1.04×0.8011,635
London42%1.03×0.8430,076
Leeds41%0.95×0.8024,218
National (no city)55%1.04×1.0324,627

Coverage — not every prompt names a place. 81% of insurance + accounting citations come from prompts that name a UK city (101,645); the other 19% (24,627) are national, shown as the italic National (no city) baseline. So the city rows are not the whole dataset — they're the located 81%. Schema-rich lift ≈ 1.0× in every location ⇒ on-page schema doesn't change top-3 odds anywhere. The PCB/electronics dataset is excluded entirely (no geographic prompts).

Localization — do local players beat the big national brands?

How to read this section. A localized query names a UK city (e.g. "best insurance for a tradesperson in Manchester"); a national query names no place ("best plumbing insurance UK"). A national incumbent here = a company cited across ≥3 different cities and among the 15 most-cited such companies (genuine national scale + footprint — Hiscox, AXA, Aviva, Simply Business…) — this deliberately excludes the client and single-city local brokers. Incumbent share of top-3 = the fraction of a city's top-3 slots those incumbents hold; a low share means local/regional players are winning.

Takeaway (the hypothesis was only half right). Naming a city does not uniformly hand the top 3 to small players — across all localized prompts the national incumbents actually hold a slightly higher share than in national prompts (17% vs 12%). The real effect is city-specific: in 4 of 6 cities (Glasgow, Birmingham, Manchester and Cardiff) the incumbents nearly vanish — holding under 15% of the top 3 — while in London and Leeds they hold ~22–27%. Where local players win, they win decisively: each low-incumbent city has hundreds of distinct local/regional firms taking those slots.

UK city named in promptincumbent share of top-3distinct local players in top-3top-3 slots (n)
Glasgow0%174340
Birmingham10%412822
Manchester13%5291095
Cardiff13%308625
London22%5211540
Leeds27%3181024

The gradient is the finding: Glasgow → 0% incumbent share (local brokers own it outright), rising monotonically to Leeds 27%. The national-average comparison washes this out because London + Leeds carry most of the volume. The 15 incumbents under test: hiscox, simply business, axa, direct line for business, the hartford, axa uk, markel direct, aviva…

Data-quality note (why we use the prompt text, not the stored location tag) — this is the geoname bug you flagged. The pipeline's stored location_slug is unreliable as a per-prompt geo label: only 48.7% of rows agree with the city named in the prompt, 17.7% point to the wrong city (a mis-aligned production backfill — e.g. a Cardiff tag on a Leeds prompt), 18.3% carry a tag the prompt never names, and 4.9% name a city but were left untagged. The same city was also stored under two forms (geoname-2643743 and london-uk), splitting it in two. We therefore derive the city from the prompt text and merge the duplicates — every figure above is slug-independent. (An earlier draft of this section, built on the raw tags, reported a false "incumbents ~0% in cities" effect; that was the bug.)

[source: rankings.json — de-branded, test/probe prompts removed — × query-text city × top-15 multi-city incumbent set. Insurance + accounting sites only (5,446 localized + 1,699 national top-3 slots); the PCB/electronics dataset has no geographic prompts.]

Who wins locations, per engine

National-incumbent share of the top-3, by engine × city (cities ordered most→least local-friendly). Green = local players own it; red = the established national brands hold on.

engineGlasgowBirminghamManchesterCardiffLondonLeedsall localized
perplexity0%6%10%12%19%18%13%
openai0%13%11%18%18%32%17%
gemini0%10%14%7%25%27%18%
anthropic0%11%18%15%26%33%20%

perplexity is the most open to local/niche players (incumbents only 13% of localized top-3); anthropic leans hardest on the established national brands (20%). But the gradient itself is identical on all four engines: Glasgow 0% everywhere, climbing to ~25–33% in London & Leeds.

Your thesis: is niche / regional easier to rank than global?

Largely yes — on the parts a single snapshot can test, with one honest caveat.
  • Smaller regional markets are wide open. In secondary cities the established national insurers nearly disappear from the top-3 — 0% Glasgow, 10% Birmingham, 13% Manchester, 13% Cardiff — vs ~22–27% in the biggest, most-contested markets (London, Leeds). The smaller and more specific the market, the lower the incumbent wall — and it holds on all four engines.
  • Traditional SEO authority does NOT gate LLM citation. A domain's backlink authority predicts top-3 at a coin flip (AUC open 0.51 · anth 0.51 · gemi 0.49 · perp 0.48 — all ≈0.50, §2). You don't have to out-rank decades-old brands in classic SEO to get cited; the "years of link-building" moat simply doesn't transfer to LLM answers.
  • The global "head" is where the old brands live. The only place the engines agree (the cross-engine consensus core, §3) is a thin head of household names — Hiscox, The Hartford, Simply Business. Broad/global prompts surface that head; niche + regional prompts surface a long, fragmented tail (2,262+ distinct local players across the six cities) that a new entrant can break into.

Honest caveat: this is one point in time, so it shows the niche/regional opportunity (a low incumbent wall, no SEO-authority gate) but cannot measure ranking speed. "We rank in days; SEO takes months" is fully consistent with this data but isn't proven by it — that needs a time series. And the raw localized-vs-national averages are confounded by vertical (insurance vs accounting), which is why the headline is the per-city gradient, which isn't.

8 · The eligibility floor — cited vs wild

Takeaway: To be citable at all, be answer-shaped (definitions, examples — 74% of cited pages vs ~18% of the web) and carry real content schema (FAQ/Article/LocalBusiness). Boilerplate schema is everywhere and doesn't distinguish you. This earns eligibility, not rank.

% with each signal across four cohorts: cited pages · uncited siblings (same domains) · Majestic top-1M draw · a uniform Common Crawl web sample (n=106 — the true long tail Majestic excludes). Key read: the two random baselines agree, and schema is widespread (~half the live web has it — CMSs auto-emit), so schema presence is a weak discriminator. The real ~3× gap is answerability/depth (answer-shaped content, definitions, examples, length): cited 74% answerable vs CC-web 18%. So eligibility = substantive answer-shaped content + real content schema, not boilerplate markup. Note the schema decomposition below: boilerplate (WebSite/Org/Breadcrumb) is everywhere (~40–60% across all cohorts, weak signal); content schema (LocalBusiness/FAQ/Article/Review) is the ~3× differentiator (cited 46% vs web 19%–18%). Neither moves rank once you're cited.

% presentcitedsiblingMajestic 1MCC web
has_any_schema68%64%43%52%
boilerplate_schema57%61%38%41%
content_schema46%39%18%19%
has_definitions71%64%27%18%
has_examples57%47%17%11%
has_conclusion16%12%3%4%
has_topic_sentences72%68%60%59%
has_faq_markup20%15%6%0%
answerability74%65%31%18%
has_aggregate_rating7%8%2%1%
has_author_schema13%23%2%1%
has_quotation24%34%20%18%

Medians

mediancitedsiblingMajestic 1MCC web
word_count1266.001437.00726.50731.50
schema_count7.005.000.001.00
schema_types_n2.002.000.000.00
stat_density25.4631.2446.3866.90
presence_score25.0025.000.0025.00
completeness_score21.0018.000.000.00
psi_score61.00

CC web n=106 (uniform Common Crawl draw, the genuine long tail) — rates ±~10%. It tracks Majestic on schema (≈ half the web) but confirms the big answerability/depth gap, validating the eligibility-floor finding against a true-web baseline.

9 · Recency

content agetop-3 support raten
<1mo45%34125
1-3mo46%11394
3-6mo46%10322
6-12mo46%8216
>12mo50%14131

Flat — no "3-month freshness cliff" in our data.

10 · This study vs prior work

findingprior workour resultverdict
Backlinks / domain authoritySemrush: "weak". Classic SEO: strong.AUC ≈ 0.50 (no effect)agree w/ Semrush, refute classic SEO
Cross-engine overlap~11% (ChatGPT∩Perplexity)~9% mean pairwiseagree
Q&A / FAQ formattingSemrush +25% / Citation-Absorption −5.7%+1 to +4% (all engines)splits the difference; small
Statistics in contentPrinceton +30–40% (causal)no association (cited < random)diverge (theirs causal, ours observational)
Semantic fit > lengthCitation-Absorption: yesboth weak; relevance AUC ≈ 0partial — both small here
Freshness / recency cliff~3-month cliffflat across agerefute
Keyword stuffingPrinceton: negativenegative on Gemini (0.73×)agree

Princeton GEO is causal (edit one page, re-query); ours is observational across a brand-dominated competitive field — both can hold. The headline divergence: authority/backlinks and stats are levers in the SEO/GEO literature but non-levers for LLM citation here.

11 · Statistical robustness — Bayesian vs frequentist cluster-robust + MCMC

Takeaway: We re-checked the findings three ways (standard, cluster-robust, and Bayesian) — they all agree. The conclusions don't depend on the statistical method; the company effect dwarfs every page feature in all of them.

Why this section exists, in plain English. A skeptic could ask: "are these results just an artifact of how you ran the maths?" So we re-ran the same question three independent ways and put the answers side by side. freq OR = the standard estimate. cluster-robust CI = the same estimate but with honest error bars that account for one company appearing in many rows (the strictest fix for non-independence). Bayes OR / HDI = a from-scratch re-estimate using a different statistical engine entirely. If all three columns line up, the finding is real and not a method artifact — and here they do. The closing "variance partition" simply asks: of everything that decides rank, how much is "which company you are" vs "what's on the page" — and finds the company part is more than 10× larger.

schema presence

enginefreq ORnaive 95% CIcluster-robust CIBayes OR94% HDIP(OR>1)
openai0.94[0.897, 0.985][0.882, 1.002]0.94[0.899, 0.983]0.009
anthropic1.021[0.987, 1.057][0.973, 1.072]1.021[0.988, 1.054]0.881
gemini1.015[0.967, 1.066][0.956, 1.078]1.016[0.972, 1.062]0.736
perplexity0.992[0.955, 1.031][0.944, 1.042]0.993[0.958, 1.027]0.349

readability

enginefreq ORnaive 95% CIcluster-robust CIBayes OR94% HDIP(OR>1)
openai1.051[1.022, 1.082][1.01, 1.094]1.051[1.022, 1.079]1
anthropic0.951[0.93, 0.971][0.924, 0.978]0.951[0.931, 0.97]0
gemini0.957[0.93, 0.986][0.919, 0.997]0.957[0.931, 0.983]0.001
perplexity1.019[0.994, 1.044][0.983, 1.057]1.019[0.996, 1.043]0.93
Variance partition (hierarchical Bayesian, company random intercept). Company-level SD = 3.137 on the logit scale; every standardized page-feature coefficient is <0.1 (largest 0.07). The company effect is more than an order of magnitude larger than any page feature — moving "which company you are" is worth ~30–100× a one-SD change in any on-page signal. But (per §2) that effect is not external authority — it's unexplained brand salience.

Point estimates agree (Bayes ≈ frequentist at this n); cluster-robust CIs (clustered by company) are wider — the honest non-independence fix. Bayesian fixed-effects on full per-engine data; hierarchical on a 12,000-row subsample (rhat>1.01 — magnitude approximate, conclusion robust).

12 · Method & limits

Models. Per-LLM logistic (top-3 ~ standardized features + company effect + site/niche FE), self-citations excluded; FDR across the feature×engine grid; bootstrap lift CIs; 5-fold CV AUC; external authority via Majestic Million join; Bayesian + cluster-robust robustness (§11).

Limits. Selection / survivorship: the cohort is companies LLMs already cite — we never observe niche players that were never mentioned, so the study explains rank among the considered set, not how to enter it from zero. "Eligibility" (cited-vs-wild) is therefore a crude proxy — random web isn't "uncited competitors"; the cleanest within-niche control is the sibling arm (same domains, uncited), where gaps are small. Observational (associational, not causal — distinct from Princeton's intervention); the dominant per-company effect is unexplained by measured signals (likely brand salience, unmeasured); the endogenous "popularity" proxy is partly circular (hence the external Majestic check); single snapshot; wild-random skewed to popular sites (conservative); PSI-lab is the speed signal; source-role Haiku-classified with gaps (Gemini); hierarchical MCMC on a subsample with imperfect convergence.