Topical Authority Clusters for AI Search: How-To Guide

TL;DR

Citation share in AI search is heavily concentrated: the top 10 domains in a topic capture roughly 46% of ChatGPT citations, and the top 30 capture 67%. Depth wins.
LLMs infer authority from entity coverage — how completely you map a subject — not just from a single well-written page.
Build hubs (pillar pages) that define the topic and spokes (supporting pages) that resolve every adjacent question, entity, and comparison.
Internal links from spokes to hub tell both Google and AI engines which page is canonical. Google's leaked siteFocus and siteRadius metrics reward this exact structure.
Ship the cluster in one sprint, not drip-fed. Partial clusters read as shallow to LLMs and rarely get cited.

A topical authority cluster is a coordinated set of pages that covers one subject with enough breadth and depth that a language model has no reason to cite anyone else. It is the hub-and-spoke content model rebuilt for generative retrieval: one pillar page defines the topic, ten to thirty supporting pages resolve every sub-question, and internal links wire them into a graph the model can read as a single authoritative source.

Why LLMs reward depth, not just quality

Traditional SEO could reward a single strong page. Generative engines can't. When ChatGPT or Perplexity assembles an answer, it retrieves multiple passages and scores them against the query. A domain that ranks for the head term but has nothing on the long tail looks thin. A domain that owns the head term and the sub-topics, entities, comparisons, and edge cases looks like the reference.

The concentration effect is real. Recent citation-share analysis suggests the top 10 domains in any given topic pull nearly half of ChatGPT citations, and the top 30 pull two-thirds (Graphite). Once a domain crosses the depth threshold, it gets cited across dozens of adjacent queries. Below it, citations trickle.

Google's 2024 Content Warehouse API leak exposed two metrics that formalize this: siteFocus measures how concentrated a site is within a subject area, and siteRadius measures how far individual pages drift from that focus. High focus, low radius is the target state. Topical clusters engineer exactly that shape.

LLMs assess depth through patterns — topic completeness, internal consistency, contextual clarity. A page that answers one question but ignores the obvious follow-ups reads as shallow. A cluster that layers definition, mechanism, comparison, edge cases, and worked examples reads as trustworthy, and trustworthy is what gets cited.

The hub-and-spoke structure that actually works

A cluster has three layers.

The hub (pillar page). One canonical URL that defines the topic, covers every major sub-concept at summary depth, and links out to spokes for detail. Target the head term. Keep it long enough to establish coverage (typically 2,500–4,000 words) but resist the temptation to answer everything in place — you want spokes to earn independent citations.

The spokes (supporting pages). Ten to thirty pages, each owning one sub-question, entity, comparison, or use case. Each spoke targets a specific long-tail query, links back to the hub with descriptive anchor text, and cross-links to two or three sibling spokes where the topic naturally overlaps.

The entity layer. Definitions, glossary entries, and "what is X" pages for every named entity in your topic. This is the layer most teams skip and the layer LLMs use to disambiguate. If your topic mentions ten proper nouns — tools, standards, people, methodologies — you need a page for each.

Internal linking is the connective tissue. A pillar page with strong internal linking from related supporting content reads as the canonical answer for its topic on that domain. When an engine weighs which page to cite, internal authority sits alongside external authority in the scoring.

Worked example: an "llms.txt" cluster

Say you want to own the llms.txt topic. A minimal cluster:

Hub: What is llms.txt? A complete guide — definition, history, spec, adoption, examples, FAQ.
Spokes (how-to): writing your first llms.txt, llms.txt for SaaS pricing, llms.txt for docs sites, llms.txt for e-commerce, validating your llms.txt.
Spokes (comparisons): llms.txt vs robots.txt, llms.txt vs sitemap.xml, llms.txt vs schema markup.
Spokes (edge cases): what to do if crawlers ignore llms.txt, versioning your llms.txt, llms.txt and paywalled content.
Entity pages: Anthropic, OpenAI GPTBot, PerplexityBot, ClaudeBot, Common Crawl.
Evidence pages: adoption survey, case study, before/after citation data.

Every spoke links to the hub with the exact anchor "llms.txt guide". The hub links to each spoke from a table of contents. Entity pages link laterally when mentioned. Total: roughly 20 URLs, publishable in a two-to-three-week sprint.

The reason this works: when Perplexity is asked "how do I write llms.txt for a SaaS company," it retrieves the spoke. When ChatGPT is asked "what is llms.txt," it retrieves the hub. When Gemini is asked "llms.txt vs robots.txt," it retrieves the comparison spoke. One brand, three citations, one topic.

Executing the sprint: what to prioritize

Map before you write. Pull the top 30 questions from AlsoAsked, Perplexity's "related" panel, and Reddit threads in your niche. Cluster them. Every cluster becomes a spoke.

Front-load evidence. Adding statistics to content correlates with a 22% lift in AI visibility, and quotations correlate with a 37% lift (Backlinko). Each spoke should carry at least one cited stat and one attributed quote.

Link with intent. Descriptive anchors, not "click here." Every spoke → hub. Two or three sibling links per spoke. Entity pages linked inline the first time a term appears on any page.

Ship the whole cluster. Partial clusters underperform because LLMs sample coverage, not individual pages. A hub with three spokes reads as thin. A hub with fifteen reads as a reference.

Then work brand. Brand search volume is the strongest single predictor of LLM citations, with a reported correlation around 0.334 — higher than backlinks (Ahrefs). Once the cluster is live, drive branded queries to it through newsletter, podcast mentions, and community presence.

FAQ

How many spokes does a cluster need?

Ten is a functional floor for a mid-sized topic; twenty to thirty is where most competitive clusters land. Don't pad. Every spoke should map to a real query with search or prompt volume — empty spokes dilute siteFocus.

Can I retrofit an existing site into clusters?

Yes. Audit existing content, identify natural hubs, consolidate duplicative pages via 301s, and add missing spokes. Rewrite internal links so every page in the cluster points to the hub with consistent anchor text. Expect three to six months for AI citation lift.

Do clusters help with Google AI Overviews too?

Yes. AI Overviews draws from the same underlying ranking systems that use siteFocus and siteRadius. A well-built cluster tends to lift both traditional rankings and AI Overview inclusion in parallel.

Building topical authority clusters for AI search