What drives LLM citation rank — and where the engines diverge

Observational, confound-controlled study: 138,186 source citations, 4 LLMs, 3 company datasets, full page re-audit + query-relevance + external authority. Associational, not causal.

The one-line finding. LLM citation rank is governed far more by which company a page belongs to (~75% of the variance) than by anything on the page — and a model using every on-page lever you can change predicts rank at roughly a coin flip (AUC 0.49–0.60). That per-company factor is not domain authority (backlinks also a coin flip), on-page craft, or query-relevance — most consistent with brand familiarity in the model's training. Page structure/answerability earns a place in the eligible set, not rank. The catch: that factor is a residual we can name but not directly build — you can't optimise or link-build your way up. (§2 unpacks exactly what this does and doesn't mean.)

What moves LLM visibility

~75%Which company, not which page — ~75% of the variation in who ranks is a stable per-company effect (not backlinks, not anything buildable). The on-page levers you can change predict rank at ~a coin flip (AUC 0.49–0.60). See §2 for what this does and doesn't mean.

74% / 18%Answer-shaped content — cited pages directly answer the question (definitions, examples) far more than a typical web page.

~9%Per-engine tailoring — the four engines pick the same top-3 only ~9% of the time; optimise per engine, not "for AI".

What doesn't — the GEO myths

AUC 0.50Backlinks / domain authority — predicts citation no better than a coin flip.

≈1.0×Page speed — no effect on rank; LLMs cite slow pages freely (74% of cited pages are >4s). We tested the "big brands buy CDNs" confound directly — it isn't there.

≈ noneStats, freshness, boilerplate schema, Q&A-stuffing — no measurable citation lift.

How to read this report

The data. We ran 459 real business questions (e.g. "best accountants for contractors in Leeds", "public liability insurance for electricians") through 4 LLMs — ChatGPT (OpenAI), Claude (Anthropic), Google (Gemini), Perplexity — across 3 client niches (accounting, business insurance, AI tooling). For every answer we recorded which companies each LLM named and in what order, and which web pages it cited as sources — 138,186 citations in all. We then re-audited every cited page (content, structure, schema, speed) and audited random slices of the wider web for comparison. Brand-naming queries were removed, so a company can't win simply by being named in the question.

Plain terms: "rank" = where a company appears in an answer · "top-3" = named in the first three · "cited page" = a URL an LLM used as a source · "visibility" = a 0–100 score for how high a company tends to rank.

Key terms — in plain English

Odds-ratio (OR): How much a page feature multiplies the odds of landing in the top 3. 1.0 = no effect; 1.2 = 20% better odds; 0.8 = 20% worse. Nearly everything here is 0.9–1.1 (small).
Lift: The plain version of OR — how many times more likely a page is to support a top-3 company with a feature vs without it.
AUC: How well a model predicts top-3 ranking: 0.5 = a coin flip (useless), 1.0 = perfect. Our page-feature models sit at 0.49–0.61 — features barely predict rank.
Significance (q<.05, ✱): The result is unlikely to be chance, after correcting for testing many features at once. More stars = more confidence.
Content vs boilerplate schema: Structured-data tags in a page's code. Boilerplate = auto-generated tags every CMS emits (WebSite, Organization). Content = meaningful tags describing the page (FAQ, Article, LocalBusiness, Review).
Answerability: Whether a page directly answers the question — definitions, examples, a clear conclusion — rather than just marketing copy.
Brand / company effect: The part of ranking explained by which company it is, independent of its pages — a company's stable tendency to be cited after page features are removed.
Popularity (proxy): How many distinct questions a company shows up for at all — a rough, in-dataset stand-in for prominence.
External authority: Classic SEO authority — how many other sites link to a domain (Majestic Million rank). An independent signal, unlike "popularity".
Eligibility vs rank: Eligibility = being the kind of page that can be cited at all; rank = how high you place once eligible. Most signals affect eligibility, not rank.
Cohorts: Cited = pages LLMs cited · sibling = other (uncited) pages on those same sites · Majestic / CC-web = random established / typical web pages, for comparison.
Cross-engine overlap (Jaccard): How much two engines' top-3 lists share: 0 = nothing in common, 1 = identical. Ours ≈ 0.09.
Consensus core: The handful of brands that all four engines put in the top 3 for the same question — the thin "head" they agree on despite disagreeing on everything below it.
Transfer: If a company is top-3 on one engine, how often it's also top-3 on another. Low transfer ⇒ winning on ChatGPT doesn't get you onto Perplexity.
Localized vs national query: Localized = the prompt names a UK city ("…in Manchester"). National = it names no place ("…in the UK").
National incumbent: For the localization section: a company cited across ≥3 different cities and among the 15 most-cited such firms (national scale + footprint) — excludes the client itself and single-city local brokers.
Order of magnitude: "≈10× or more." When we say the company effect is an order of magnitude larger than page features, one is in the tens-of-percent range and the other in the multiples range.
Cluster-robust CI / Bayesian HDI: Two stricter ways to draw the error bars — one corrects for the same company appearing many times, the other re-estimates from scratch. Both agree with the standard result here.
Correlation (Spearman ρ): Strength & direction of a relationship, −1 to +1; 0 = no relationship.

Executive summary

138,186

citations modeled

4,961

companies

0.51

authority→rank AUC (≈ coin flip)

~9%

cross-engine top-3 overlap

What matters most — 10 stats for ranking on LLMs

Which company, not which page, decides rank. ~75% of the variation in who lands top-3 is a stable per-company effect (ICC ≈ 0.75; company SD 3.1 log-odds vs page coefs <0.08). The actionable flip side: a model using every on-page lever predicts rank at AUC 0.49–0.60 — a coin flip. It's a residual ("which company," not "being a brand"), not backlinks, craft, or relevance — most consistent with training-data familiarity. (See §2 for the full what-this-means-and-doesn't.)
Backlinks / domain authority do NOT move LLM rank. External authority (Majestic) predicts top-3 at AUC ≈ 0.50 (coin flip). You cannot link-build your way to citations.
Engines diverge on the tail, agree on the head. Only ~9% full top-3 overlap, but the same brand is #1 on ≥2 engines 37% of the time — a thin consensus core of big brands, divergent everywhere below.
Localization helps locals only in some cities — not as a rule. Naming a city doesn't shrink national-incumbent share on average (17% localized vs 12% national). But it's sharply city-specific: incumbents hold 0% in Glasgow and under 15% in 4 of 6 cities, vs ~27% in Leeds.
Eligibility = answerability + content schema, not boilerplate. Cited pages are ~3× the typical web on answerability (74% vs CC-web 18%) and content schema (46% vs 19%) — LocalBusiness/FAQ/Article/Review. But boilerplate schema (WebSite/Org) is everywhere (~40–60%) and doesn't discriminate, and none of it moves rank once cited.
Page speed is a near-non-factor (OR ≈ 1.0; small per-engine differences only).
Stats beside claims don't help citation here — cited pages had lower stat-density than random web. Contrarian vs Princeton/Citation-Absorption.
Q&A/FAQ helps a little, on every engine (+1–4%) — neither the +25% nor the −5.7% the literature claims.
No freshness cliff — citation rate is flat from <1mo to >12mo old content.
Readability is the one consistent on-page positive — robustly + on OpenAI/Perplexity, − on Anthropic.
Don't keyword-stuff — title↔body keyword matching is negative on Gemini (0.73×), echoing Princeton.

The playbook grounded in the findings below

Publishing cadence coverage > frequency

Evidence: Citation rate is flat across content age (no freshness cliff); page features add ~0 AUC over the company effect.

Don't run a treadmill for "freshness." Publish to cover query intents — one strong answer-shaped page per intent — then stop. Cadence tracks new intents, not a weekly quota.

Internal build vs. external salience, not link-building

Evidence: Prominence dominates rank, but external domain authority (backlinks) predicts citation at a coin-flip (AUC ≈ 0.50). Self-citation share: Perplexity 55%, OpenAI 46%, Anthropic 39% (leans on third-party review 23%).

Build one excellent internal page per intent (eligibility floor). For "external," chase brand mentions & category presence that build salience — being named/reviewed/compared broadly — not link-building for domain authority, which has ~zero effect on LLM citation. Anthropic especially rewards being reviewed/compared on third-party sites.

How to structure language answer-first, definitional, not keyword-stuffed

Evidence: Cited pages are answer-shaped far more than random: answerability 74% vs 31%, definitions 71% vs 27%. Readability is the only consistent positive; keyword over-optimization is negative on Gemini.

Lead with a direct definitional answer, then concrete examples, clear topic sentences, a short conclusion. Mirror query language naturally — do not keyword-stuff. Completeness over length.

Stats beside each claim credibility, not a rank lever

Evidence: Stat density did not predict citation (cited 25.46 < random 46.38 per 1k words; lift ~1.0×). Contradicts Princeton's causal +30–40%.

Add stats for human trust — but in a brand-dominated field they are not a citation lever. Spend the effort on coverage + brand salience instead.

Per-engine fingerprints what matters on each

Tilts are small (±~10%) — brand prominence dominates all four — but these are the levers that move the margins once you're in the consideration set.

ChatGPT (OpenAI) — the one engine where craft pays

Most feature-responsive (AUC 0.61) — on-page optimization actually moves rank here, unlike the others.
Rewards completeness (1.08×) + readability (robustly positive, CI [1.01,1.09]) + depth/definitions.
Cites your own pages ~46% of the time, plus how-to/docs — owning strong educational content pays off.

Claude (Anthropic) — third-party-driven

On-page craft is ~useless (AUC 0.49 — below random; features add noise).
Leans on third-party review (31%) + comparison (20%) more than any engine — get reviewed/compared, don't just polish your pages.
Uniquely, readability is slightly negative (don't over-polish); hardest top-3 to crack (42% base rate).

Google (Gemini) — the anti-spam engine

Punishes keyword over-optimization (0.73× — the single strongest negative in the study). Do not keyword-stuff for Google.
Likes FAQ (+3.9%, biggest Q&A boost) and depth; schema doesn't help (only engine with lift <1×).
Behaves most like Claude (ρ=0.51); likely scores its (JS-rendered) search index — may weight rendered content the others never fetch.

Perplexity — the quote/citation engine

Rewards quotable content (quotations 1.12× — highest of any engine).
Cites your own pages the most (self 54%) — first-party content pays off here.
Most divergent from ChatGPT (only 9% top-3 overlap) — wins there transfer least to here.

1 · Per-LLM driver matrix logistic, top-3, FDR-corrected

Takeaway: Read this as "how much does each page feature change the odds of ranking top-3, per engine." Everything sits near 1.0 (±10%) — page features barely move rank. The few real signals: completeness & readability help (esp. ChatGPT); keyword-stuffing hurts (Gemini).

How to read the table below. Each row is a page feature; each column is an engine. Each cell is an odds-ratio: 1.00 = that feature makes no difference to landing top-3; 1.10 = 10% better odds (green), 0.90 = 10% worse (red). Stars mark statistical confidence (more stars = less likely to be chance). The takeaway you're looking for: almost every cell is between 0.9 and 1.1 — page features barely move the needle. Read the direction of the strong ones, not the exact decimal.

Odds-ratio per +1 SD / presence, controlling for company effect, site, niche. *q<.05 **q<.01 ***q<.001. Effects are small (±10%) — read directions, not dials. Two cells are artifacts, not signals: stat_density's anthropic 2.09× is a heavy-tail outlier (robust lift ≈ 1.0×), and the presence/completeness split is collinearity (r=0.835) — read their combined effect (schema-rich lift ≈ 1.0 = neutral for rank), not the individual signs. Schema is an eligibility signal (§8), not a rank lever.

feature	openai	anthropic	gemini	perplexity
presence_score	0.94×	1.04×	1.01×	0.96×
completeness_score	1.08×*	0.99×	0.97×	1.06×*
llm_readability_score	1.07×**	1.02×	0.99×	1.07×***
semantics_score	0.97×	0.96×*	1.02×	0.96×*
word_count_log	1.02×	0.96×*	1.03×	0.96×
stat_density	1.02×	1.70×	1.03×	0.99×
external_links_log	0.97×	0.99×	1.02×	1.02×
schema_types_n	0.95×*	0.99×	0.97×	0.98×
psi_score	1.02×	1.00×	1.00×	0.97×
query_coverage_body	1.03×	1.11×***	1.04×	1.03×
query_coverage_title	0.96×	1.04×*	0.98×	1.05×*
has_definitions	1.05×	1.02×	1.02×	0.94×
has_faq_markup	0.99×	0.99×	1.01×	1.00×
has_keyword_consistency	0.94×	1.00×	0.77×**	1.09×
has_quotation	1.00×	1.02×	1.01×	1.10×**
has_topic_sentences	0.99×	0.94×	1.04×	1.00×
has_examples	0.98×	0.94×	1.02×	0.92×*

Model fit: openai AUC 0.601 · anthropic AUC 0.491 · gemini AUC 0.539 · perplexity AUC 0.534 — all near 0.5–0.6 (features weakly predictive).

Lift — the "1.3× more likely" view

2 · Prominence, authority & relevance marginal AUC

Takeaway: Nothing page-level beats simply knowing the company. Adding all page features + query-relevance + backlink authority doesn't predict rank better than the company effect alone — and backlinks alone are a coin flip.

Does anything page-level beat just knowing the company? Cross-validated AUC by signal set:

engine	popularity only	+ audit features	+ relevance	all combined
openai	0.606	0.604	0.603	0.601
anthropic	0.539	0.512	0.494	0.491
gemini	0.567	0.544	0.557	0.539
perplexity	0.557	0.545	0.544	0.534

On every engine, the company effect alone ≥ everything combined; relevance ORs 1.02–1.10 (negligible).

Is it real authority, or circular? — external check

My "popularity" was an in-dataset appearance count (partly circular). The honest test joins cited domains to Majestic Million (backlink/referring-subnet rank) — a true external authority signal. 77% of cited domains are in the top-1M.

engine	endogenous "popularity"	EXTERNAL authority (Majestic)	both
openai	0.607	0.506	0.608
anthropic	0.539	0.511	0.537
gemini	0.569	0.492	0.561
perplexity	0.557	0.478	0.549

External domain authority does NOT predict LLM citation rank. AUC ≈ 0.51–0.49 (a coin flip), Spearman ≈ 0.000. And the endogenous "popularity" is uncorrelated with external authority (ρ=-0.071). So the dominant per-company effect is not backlinks, not on-page, not relevance — most consistent with brand familiarity in the model's training. You can't link-build your way to LLM citations.

"But big brands buy CDNs — wouldn't speed skew this?"

A fair worry — so we tested it directly, and the confound isn't in the data. Page speed (PSI-lab / Lighthouse) is a real measurement on 97% of cited pages (not imputed), so the ≈1.0× isn't a missing-data artifact. It comes up empty three independent ways:

No raw correlation to absorb. Even without controlling for brand, speed barely moves with rank — Spearman ρ = 0.009 vs top-3. It's not that "brand soaks up the speed signal"; there's no signal to begin with.
Brands aren't actually faster here. The most-cited "big-brand" pages and the long tail have the same median speed score (PSI 61 vs 61). Speed doesn't track popularity (ρ = 0.004) or backlink authority (ρ = -0.087 — if anything high-authority domains are slightly slower: their content pages are heavier with marketing/analytics tech). A CDN fixes TTFB, which is already near-zero for almost everyone — table stakes that can't differentiate — while PSI/LCP is dominated by page weight, where big brands have no edge.
LLMs cite slow pages freely. Only 10% of cited pages are genuinely fast (LCP < 2.5s); 74% are slow (LCP > 4s; median 6.6s). If speed were a gate, cited pages would be fast — they're decidedly not.

So the ≈1.0× is genuine, not an artifact: crawlers fetch HTML, they don't render the page or race a stopwatch. [source: url_perf_lab PSI-lab × 97% of 138,186 cited URLs]

What the "company effect" actually is — and isn't (the study's most-misread number)

How it was measured. We fit a model that gives every company its own baseline citation-propensity (a per-company "random intercept"), then lets the page's audited features (schema, readability, query-relevance, speed) and its popularity adjust that baseline up or down. Then we ask: of all the variation in who lands top-3, how much is the company baseline vs the page itself?

The answer. The company baseline accounts for ~75% of the explainable variance (ICC = 0.75). Its spread — SD 3.14 on the log-odds scale — is roughly 43× the largest page-feature effect (0.073). That's where "order of magnitude" comes from.

It is NOT "company vs not-a-company." You're right that every page belongs to some company — there's no "not a company." The contrast is between companies: each gets its own baseline, and those baselines are spread enormously. A company one SD above average has ~23× the citation odds of an average one for the same page. So it's which company — not "being" a company — that dominates.

The honest, non-tautological half. Part of "company predicts company-rank" is mechanical — the outcome is company-level, so of course the company matters. The result you can actually act on is the flip side: a model using every on-page lever you can change predicts top-3 at AUC 0.49–0.60 — between a coin flip (0.50) and barely better. Schema, readability, relevance, speed, and backlinks (AUC ≈ 0.50, §2) move rank by ~10% at most. You cannot optimise your way up.

So what is "brand identity" in practice — and the caveat. It's a residual, not a dial you can turn: the stable per-company tendency left after we strip out page quality, relevance, popularity and backlinks. Concretely — take two pages with identical audit scores, one from a household-name insurer and one from an unknown broker; the household name wins, and no on-page change closes the gap. We call that residual "brand familiarity" because it tracks recognisable market leaders and isn't explained by anything buildable — but strictly it is unexplained between-company variance, which could also carry other unmeasured company-level signals (training-corpus size, how often the brand is mentioned, Wikipedia presence). We name it; we don't directly observe it.

3 · Cross-LLM divergence

Takeaway: The four engines mostly disagree on the full top-3 (~9% overlap) — but they do share a thin head of recognized brands (consensus core below). Optimise per engine; expect agreement only on the biggest names.

Do drivers differ by engine? (interaction tests)

feature	LR χ²	p	differs by LLM
presence_score	15.7	0.001	yes
completeness_score	20.4	<0.001	yes
semantics_score	7.0	0.071	no
psi_score	14.7	0.002	yes
has_faq_markup	7.9	0.049	yes
has_definitions	4.8	0.186	no

Source ecology — what each engine cites (role share)

source role	opena	anthr	gemin	perpl
self	46%	39%	—	55%
review	12%	23%	—	19%
comparison	9%	27%	—	15%
how-to	8%	8%	—	3%
docs	6%	1%	11%	2%

Self = own page. Perplexity/OpenAI lean on own pages (~half); Anthropic on third-party review/comparison. Only ~9% top-3 overlap between engines.

…but there IS a consensus core — engines agree on the head

"No single AI ranking" is about the full top-3 list. Stratified, the engines do converge on a thin head of recognized brands:

The same company is rank-1 on ≥2 engines in 37% of queries (≥3 in 9%, all-four in just 1%).
If a company is top-3 on one engine, it's top-3 on a given other engine only 16% of the time — low transfer beyond the head.
Consensus core (top-3 on all four across multiple queries): hiscox (9), the hartford (6), simply business (4), an anonymized UK accounting firm (4), mwa accounting (4), flux (3).

Concrete cases where one company was top-3 on all four engines (ranks o/a/g/p):

query	company	o/a/g/p
Top insurance for designers in Leeds	policybee	3/3/2/2
Top 5 business insurance providers for contractors in London	hiscox	1/3/1/1
Business insurance reviews for landscapers	nationwide	2/1/2/1
Best alternatives to Hiscox for builders	the hartford	3/1/1/1
Best Couriers in Cardiff	citysprint	1/1/2/1
Best insurance for tradespeople in London	simply business	2/2/1/1
Top insurance for designers in London and why	policybee	2/1/3/3
What is the best insurance for a courier in Manchester?	admiral	1/1/1/1

So the honest statement: engines share a brand-dominated head (Hiscox, The Hartford, Simply Business…) but diverge on everything below it. [source: rankmatrix.jsonl, 487 multi-engine queries]

4 · The Q&A contradiction, adjudicated

Takeaway: FAQ/Q&A formatting nudges citation up a little on every engine (+1–4%) — neither the +25% nor the −5.7% the published studies claim.

Semrush +25% vs Citation-Absorption −5.74%. Our per-LLM test:

engine	top-3 w/ FAQ	w/o	relative diff
openai	58%	56%	+3.36%
anthropic	42%	42%	+0.94%
gemini	48%	46%	+3.58%
perplexity	47%	45%	+3.8%

Modestly positive on all four (+1–4%) — does not flip sign; far smaller than either camp claims.

5 · Company breadth

Takeaway: More pages looks like it helps — but only because prominent brands happen to have more pages. Control for prominence and the effect reverses. Publishing more pages is not a lever.

Raw ρ = 0.08; controlling for the company effect, partial ρ = -0.36 — apparent breadth benefit is a prominence proxy.

distinct cited pages	mean visibility	median best rank	companies
2	34.7	7	18
3	30.2	6	17
4-5	48.6	4	122
6-10	52.0	4	2771
11+	53.2	2	2018

6 · Per-niche variation

Takeaway: The picture barely changes by niche — schema helps trades a touch more than accounting/crypto, but brand prominence dominates in every vertical.

Top-3 rate + on-page lift by niche. Variation is modest; schema helps trades (Landscapers/Carpenters/Builders ~1.08–1.09×) slightly more than accounting/crypto. The brand-dominated story holds everywhere.

niche	top-3 rate	schema-rich lift	answerable lift	n
Accountant for Therapists	63%	1.04×	1.05×	1671
Accountant for Contractors	60%	1.06×	1.03×	1750
Accountant for Crypto Returns	59%	0.96×	1.05×	1659
Consultants	57%	1.07×	1.02×	4390
Electricians	57%	1.04×	1.02×	4895
Accountant for Lawyers	57%	1.00×	1.04×	2154
Builders	57%	1.00×	1.02×	4716
Accountant for Crypto	56%	0.91×	0.99×	2086
Accountant for Designers	56%	0.95×	1.04×	2109
Accountant for Ecommerce	55%	0.98×	0.99×	6443
Decorators	55%	1.01×	1.03×	4829
Landscapers	55%	1.09×	1.04×	5156
Crypto Tax Returns	55%	1.03×	1.02×	2001
Accountant for Recruiting agencies	54%	0.96×	0.98×	1988
Carpenters	54%	1.08×	0.99×	4770
Designers	54%	0.98×	1.03×	4659
Accountant For Contractors	53%	1.03×	1.00×	2075
Accountant for Doctors	53%	0.96×	0.98×	4688
Contractors	53%	1.07×	1.09×	5111
Accountant for Landlords	53%	1.01×	0.99×	5852
Couriers	53%	1.05×	1.02×	5170
Contractor Accountant	52%	0.94×	1.00×	1906
Accountant for Dentist	51%	0.98×	1.07×	2171
Tradespeople	51%	1.04×	1.03×	5336
Self employed accountant	50%	1.01×	0.90×	1989
Limited Company Accounting	47%	0.99×	0.95×	2591

7 · Per-location variation & localization

Top-3 rate by the city named in the prompt (variation tracks competitive density — bigger metros = more competitors = lower top-3 rate). Cities are resolved from the prompt text, not the stored location tag (see the data-quality note below for why).

prompt names…	top-3 rate	schema-rich lift	cited-domain authority	citations (n)
Glasgow	59%	0.98×	0.75	3,926
Birmingham	51%	1.02×	0.83	12,728
Manchester	48%	0.99×	0.88	19,062
Cardiff	48%	1.04×	0.80	11,635
London	42%	1.03×	0.84	30,076
Leeds	41%	0.95×	0.80	24,218
National (no city)	55%	1.04×	1.03	24,627

Coverage — not every prompt names a place. 81% of insurance + accounting citations come from prompts that name a UK city (101,645); the other 19% (24,627) are national, shown as the italic National (no city) baseline. So the city rows are not the whole dataset — they're the located 81%. Schema-rich lift ≈ 1.0× in every location ⇒ on-page schema doesn't change top-3 odds anywhere. The PCB/electronics dataset is excluded entirely (no geographic prompts).

Localization — do local players beat the big national brands?

How to read this section. A localized query names a UK city (e.g. "best insurance for a tradesperson in Manchester"); a national query names no place ("best plumbing insurance UK"). A national incumbent here = a company cited across ≥3 different cities and among the 15 most-cited such companies (genuine national scale + footprint — Hiscox, AXA, Aviva, Simply Business…) — this deliberately excludes the client and single-city local brokers. Incumbent share of top-3 = the fraction of a city's top-3 slots those incumbents hold; a low share means local/regional players are winning.

Takeaway (the hypothesis was only half right). Naming a city does not uniformly hand the top 3 to small players — across all localized prompts the national incumbents actually hold a slightly higher share than in national prompts (17% vs 12%). The real effect is city-specific: in 4 of 6 cities (Glasgow, Birmingham, Manchester and Cardiff) the incumbents nearly vanish — holding under 15% of the top 3 — while in London and Leeds they hold ~22–27%. Where local players win, they win decisively: each low-incumbent city has hundreds of distinct local/regional firms taking those slots.

UK city named in prompt	incumbent share of top-3	distinct local players in top-3	top-3 slots (n)
Glasgow	0%	174	340
Birmingham	10%	412	822
Manchester	13%	529	1095
Cardiff	13%	308	625
London	22%	521	1540
Leeds	27%	318	1024

The gradient is the finding: Glasgow → 0% incumbent share (local brokers own it outright), rising monotonically to Leeds 27%. The national-average comparison washes this out because London + Leeds carry most of the volume. The 15 incumbents under test: hiscox, simply business, axa, direct line for business, the hartford, axa uk, markel direct, aviva…

Data-quality note (why we use the prompt text, not the stored location tag) — this is the geoname bug you flagged. The pipeline's stored location_slug is unreliable as a per-prompt geo label: only 48.7% of rows agree with the city named in the prompt, 17.7% point to the wrong city (a mis-aligned production backfill — e.g. a Cardiff tag on a Leeds prompt), 18.3% carry a tag the prompt never names, and 4.9% name a city but were left untagged. The same city was also stored under two forms (geoname-2643743 and london-uk), splitting it in two. We therefore derive the city from the prompt text and merge the duplicates — every figure above is slug-independent. (An earlier draft of this section, built on the raw tags, reported a false "incumbents ~0% in cities" effect; that was the bug.)

[source: rankings.json — de-branded, test/probe prompts removed — × query-text city × top-15 multi-city incumbent set. Insurance + accounting sites only (5,446 localized + 1,699 national top-3 slots); the PCB/electronics dataset has no geographic prompts.]

Who wins locations, per engine

National-incumbent share of the top-3, by engine × city (cities ordered most→least local-friendly). Green = local players own it; red = the established national brands hold on.

engine	Glasgow	Birmingham	Manchester	Cardiff	London	Leeds	all localized
perplexity	0%	6%	10%	12%	19%	18%	13%
openai	0%	13%	11%	18%	18%	32%	17%
gemini	0%	10%	14%	7%	25%	27%	18%
anthropic	0%	11%	18%	15%	26%	33%	20%

perplexity is the most open to local/niche players (incumbents only 13% of localized top-3); anthropic leans hardest on the established national brands (20%). But the gradient itself is identical on all four engines: Glasgow 0% everywhere, climbing to ~25–33% in London & Leeds.

Your thesis: is niche / regional easier to rank than global?

Largely yes — on the parts a single snapshot can test, with one honest caveat.

Smaller regional markets are wide open. In secondary cities the established national insurers nearly disappear from the top-3 — 0% Glasgow, 10% Birmingham, 13% Manchester, 13% Cardiff — vs ~22–27% in the biggest, most-contested markets (London, Leeds). The smaller and more specific the market, the lower the incumbent wall — and it holds on all four engines.
Traditional SEO authority does NOT gate LLM citation. A domain's backlink authority predicts top-3 at a coin flip (AUC open 0.51 · anth 0.51 · gemi 0.49 · perp 0.48 — all ≈0.50, §2). You don't have to out-rank decades-old brands in classic SEO to get cited; the "years of link-building" moat simply doesn't transfer to LLM answers.
The global "head" is where the old brands live. The only place the engines agree (the cross-engine consensus core, §3) is a thin head of household names — Hiscox, The Hartford, Simply Business. Broad/global prompts surface that head; niche + regional prompts surface a long, fragmented tail (2,262+ distinct local players across the six cities) that a new entrant can break into.

Honest caveat: this is one point in time, so it shows the niche/regional opportunity (a low incumbent wall, no SEO-authority gate) but cannot measure ranking speed. "We rank in days; SEO takes months" is fully consistent with this data but isn't proven by it — that needs a time series. And the raw localized-vs-national averages are confounded by vertical (insurance vs accounting), which is why the headline is the per-city gradient, which isn't.

8 · The eligibility floor — cited vs wild

Takeaway: To be citable at all, be answer-shaped (definitions, examples — 74% of cited pages vs ~18% of the web) and carry real content schema (FAQ/Article/LocalBusiness). Boilerplate schema is everywhere and doesn't distinguish you. This earns eligibility, not rank.

% with each signal across four cohorts: cited pages · uncited siblings (same domains) · Majestic top-1M draw · a uniform Common Crawl web sample (n=106 — the true long tail Majestic excludes). Key read: the two random baselines agree, and schema is widespread (~half the live web has it — CMSs auto-emit), so schema presence is a weak discriminator. The real ~3× gap is answerability/depth (answer-shaped content, definitions, examples, length): cited 74% answerable vs CC-web 18%. So eligibility = substantive answer-shaped content + real content schema, not boilerplate markup. Note the schema decomposition below: boilerplate (WebSite/Org/Breadcrumb) is everywhere (~40–60% across all cohorts, weak signal); content schema (LocalBusiness/FAQ/Article/Review) is the ~3× differentiator (cited 46% vs web 19%–18%). Neither moves rank once you're cited.

% present	cited	sibling	Majestic 1M	CC web
has_any_schema	68%	64%	43%	52%
boilerplate_schema	57%	61%	38%	41%
content_schema	46%	39%	18%	19%
has_definitions	71%	64%	27%	18%
has_examples	57%	47%	17%	11%
has_conclusion	16%	12%	3%	4%
has_topic_sentences	72%	68%	60%	59%
has_faq_markup	20%	15%	6%	0%
answerability	74%	65%	31%	18%
has_aggregate_rating	7%	8%	2%	1%
has_author_schema	13%	23%	2%	1%
has_quotation	24%	34%	20%	18%

Medians

median	cited	sibling	Majestic 1M	CC web
word_count	1266.00	1437.00	726.50	731.50
schema_count	7.00	5.00	0.00	1.00
schema_types_n	2.00	2.00	0.00	0.00
stat_density	25.46	31.24	46.38	66.90
presence_score	25.00	25.00	0.00	25.00
completeness_score	21.00	18.00	0.00	0.00
psi_score	61.00	—	—	—

CC web n=106 (uniform Common Crawl draw, the genuine long tail) — rates ±~10%. It tracks Majestic on schema (≈ half the web) but confirms the big answerability/depth gap, validating the eligibility-floor finding against a true-web baseline.

9 · Recency

content age	top-3 support rate	n
<1mo	45%	34125
1-3mo	46%	11394
3-6mo	46%	10322
6-12mo	46%	8216
>12mo	50%	14131

Flat — no "3-month freshness cliff" in our data.

10 · This study vs prior work

finding	prior work	our result	verdict
Backlinks / domain authority	Semrush: "weak". Classic SEO: strong.	AUC ≈ 0.50 (no effect)	agree w/ Semrush, refute classic SEO
Cross-engine overlap	~11% (ChatGPT∩Perplexity)	~9% mean pairwise	agree
Q&A / FAQ formatting	Semrush +25% / Citation-Absorption −5.7%	+1 to +4% (all engines)	splits the difference; small
Statistics in content	Princeton +30–40% (causal)	no association (cited < random)	diverge (theirs causal, ours observational)
Semantic fit > length	Citation-Absorption: yes	both weak; relevance AUC ≈ 0	partial — both small here
Freshness / recency cliff	~3-month cliff	flat across age	refute
Keyword stuffing	Princeton: negative	negative on Gemini (0.73×)	agree

Princeton GEO is causal (edit one page, re-query); ours is observational across a brand-dominated competitive field — both can hold. The headline divergence: authority/backlinks and stats are levers in the SEO/GEO literature but non-levers for LLM citation here.

11 · Statistical robustness — Bayesian vs frequentist cluster-robust + MCMC

Takeaway: We re-checked the findings three ways (standard, cluster-robust, and Bayesian) — they all agree. The conclusions don't depend on the statistical method; the company effect dwarfs every page feature in all of them.

Why this section exists, in plain English. A skeptic could ask: "are these results just an artifact of how you ran the maths?" So we re-ran the same question three independent ways and put the answers side by side. freq OR = the standard estimate. cluster-robust CI = the same estimate but with honest error bars that account for one company appearing in many rows (the strictest fix for non-independence). Bayes OR / HDI = a from-scratch re-estimate using a different statistical engine entirely. If all three columns line up, the finding is real and not a method artifact — and here they do. The closing "variance partition" simply asks: of everything that decides rank, how much is "which company you are" vs "what's on the page" — and finds the company part is more than 10× larger.

schema presence

engine	freq OR	naive 95% CI	cluster-robust CI	Bayes OR	94% HDI	P(OR>1)
openai	0.94	[0.897, 0.985]	[0.882, 1.002]	0.94	[0.899, 0.983]	0.009
anthropic	1.021	[0.987, 1.057]	[0.973, 1.072]	1.021	[0.988, 1.054]	0.881
gemini	1.015	[0.967, 1.066]	[0.956, 1.078]	1.016	[0.972, 1.062]	0.736
perplexity	0.992	[0.955, 1.031]	[0.944, 1.042]	0.993	[0.958, 1.027]	0.349

readability

engine	freq OR	naive 95% CI	cluster-robust CI	Bayes OR	94% HDI	P(OR>1)
openai	1.051	[1.022, 1.082]	[1.01, 1.094]	1.051	[1.022, 1.079]	1
anthropic	0.951	[0.93, 0.971]	[0.924, 0.978]	0.951	[0.931, 0.97]	0
gemini	0.957	[0.93, 0.986]	[0.919, 0.997]	0.957	[0.931, 0.983]	0.001
perplexity	1.019	[0.994, 1.044]	[0.983, 1.057]	1.019	[0.996, 1.043]	0.93

Variance partition (hierarchical Bayesian, company random intercept). Company-level SD = 3.137 on the logit scale; every standardized page-feature coefficient is <0.1 (largest 0.07). The company effect is more than an order of magnitude larger than any page feature — moving "which company you are" is worth ~30–100× a one-SD change in any on-page signal. But (per §2) that effect is not external authority — it's unexplained brand salience.

Point estimates agree (Bayes ≈ frequentist at this n); cluster-robust CIs (clustered by company) are wider — the honest non-independence fix. Bayesian fixed-effects on full per-engine data; hierarchical on a 12,000-row subsample (rhat>1.01 — magnitude approximate, conclusion robust).

12 · Method & limits

Models. Per-LLM logistic (top-3 ~ standardized features + company effect + site/niche FE), self-citations excluded; FDR across the feature×engine grid; bootstrap lift CIs; 5-fold CV AUC; external authority via Majestic Million join; Bayesian + cluster-robust robustness (§11).

Limits. Selection / survivorship: the cohort is companies LLMs already cite — we never observe niche players that were never mentioned, so the study explains rank among the considered set, not how to enter it from zero. "Eligibility" (cited-vs-wild) is therefore a crude proxy — random web isn't "uncited competitors"; the cleanest within-niche control is the sibling arm (same domains, uncited), where gaps are small. Observational (associational, not causal — distinct from Princeton's intervention); the dominant per-company effect is unexplained by measured signals (likely brand salience, unmeasured); the endogenous "popularity" proxy is partly circular (hence the external Majestic check); single snapshot; wild-random skewed to popular sites (conservative); PSI-lab is the speed signal; source-role Haiku-classified with gaps (Gemini); hierarchical MCMC on a subsample with imperfect convergence.