Geodocs.dev

llms.txt Advanced Section Patterns & Hierarchical Structure

ShareLinkedIn

Open this article in your favorite AI assistant for deeper analysis, summaries, or follow-up questions.

Advanced llms.txt files extend the base spec from llmstxt.org with H2 topic sections, an Optional block for secondary resources, blockquote summaries per link, and a paired llms-full.txt that contains the complete documentation body for one-pass ingestion. Major implementations from Anthropic, Cloudflare, Stripe, and Vercel show how teams partition large knowledge bases without exceeding LLM context windows.

TL;DR

A basic llms.txt is one H1, one blockquote summary, and a flat link list. An advanced llms.txt adds H2 section groupings (by topic, audience, or content type), per-link summaries, an ## Optional section for low-priority resources, and a paired /llms-full.txt that inlines the full Markdown body. Use the base llms.txt as a navigation index (under 10 KB) and llms-full.txt as the bulk content surface that AI crawlers actually fetch most often.

Overview

The llms.txt standard, proposed by Jeremy Howard at Answer.AI in September 2024, defines a Markdown-formatted file at the site root that gives language models a curated entry point into a knowledge base. The base spec is intentionally minimal: an H1 project name, a blockquote summary, and link sections grouped by H2. For knowledge bases beyond ~50 pages, that base structure starts to fail in three ways:

  • All links collapse into a single flat list, losing topic hierarchy.
  • The 10 KB practical ceiling becomes hard to enforce when every page must be linked.
  • LLMs cannot distinguish primary documentation from edge-case references.

Advanced patterns address these limitations by introducing hierarchical sections, an Optional block, paired llms-full.txt files, and sitemap integration. This guide documents the patterns used by Anthropic, Cloudflare, Stripe, Vercel, Mintlify, and other large knowledge bases as observed in their published llms.txt files.

The base spec, briefly

For reference, here is the minimum-conforming llms.txt:

Project Name

One-line summary of the project, used by LLMs to introduce it.

Optional paragraph with context, scope, and important caveats.

Core documentation

Optional

Advanced patterns retain this structure and extend it.

Pattern 1: Hierarchical H2/H3 sections

The base spec uses H2 for sections. Large knowledge bases use H2 for top-level topics and H3 for sub-areas, mirroring the documentation IA. This works because LLMs trained on Markdown treat heading depth as a salience signal.

Authentication

OAuth 2.0

API keys

Keep H3 sections to one heading level only — do not nest H4 or deeper. The llmstxt.org spec does not formally allow H3 grouping under H2, but practitioner usage shows it parses correctly across Mintlify, GitBook, and Fern auto-generators, and Anthropic's published llms.txt uses H3 grouping internally.

Pattern 2: Optional section for secondary resources

The single most-cited advanced pattern is the ## Optional section, which the llmstxt.org spec explicitly defines as content that LLMs should treat as deprioritized. Use it for:

  • Legacy documentation that remains accessible but is not the recommended path
  • Marketing pages that an LLM might encounter via sitemap but should not feature in answers
  • Deep-reference material useful only when the primary docs do not resolve the query

Optional

LLMs that respect the spec will fetch Optional links only when the H2-listed primary resources do not contain the answer.

Each link should carry a short summary after the colon. This is the highest-leverage edit for AI retrieval quality because the summary is what an LLM reads when deciding whether to follow the link.

Good:

  • Webhooks reference: Event types, signature verification (HMAC-SHA256), and retry behavior.

Weak:

The good summary lets an LLM answer "how do I verify webhook signatures?" without fetching the page; the weak summary forces a fetch.

Pattern 4: llms-full.txt pairing

The llms-full.txt companion file inlines the full Markdown body of every page listed in llms.txt. Rather than a navigation index, it is a single-file content dump optimized for one-pass ingestion. Mintlify's analytics show llms-full.txt receives heavier traffic than llms.txt because LLMs prefer embedding the full content surface up front rather than chasing links.

Decision rule:

  • Total Markdown corpus < 200 KB: Ship llms-full.txt as a single file.
  • 200 KB to 1 MB: Split llms-full.txt by H2 section into /llms-full-auth.txt, /llms-full-api.txt, etc., and reference each in llms.txt.
  • > 1 MB: Skip llms-full.txt entirely; rely on llms.txt + per-page Markdown endpoints (e.g., Accept: text/markdown content negotiation).

Mintlify, Fern, GitBook, and Redocly auto-generate both files when the toggle is enabled.

Pattern 5: Optional YAML metadata block

Some implementations prepend a YAML block before the H1 to declare authorship, license, and update cadence. This is not part of the llmstxt.org spec, but parsers tolerate it as long as the file is otherwise valid Markdown.


generator: Mintlify v4.2

updated: 2026-05-03

license: CC-BY-4.0

languages: [en, ja, zh]


Project Name

Summary.

Use this only when the metadata is meaningful for the consuming agent (e.g., license enforcement). Most LLMs will ignore the block.

Pattern 6: Multilingual hints

For multilingual knowledge bases, publish locale-specific llms.txt files at language-prefixed paths and cross-link them in the root file:

Acme Docs

Acme product documentation, available in English, Japanese, and Chinese.

Languages

Core documentation

Do not duplicate translations inside a single llms.txt — it inflates token cost and confuses retrieval.

Pattern 7: Sitemap.xml integration

llms.txt and sitemap.xml serve different purposes: sitemap is exhaustive for search engine indexing, llms.txt is curated for LLM retrieval. They should not duplicate each other. Best practice:

  • Reference sitemap.xml from llms.txt under an Optional or supplementary section so AI agents can fall back to a complete URL list when needed.
  • Do not list every sitemap URL in llms.txt; cap at the 50-80 pages that represent canonical, evergreen content.
  • Keep sitemap.xml's accurate so AI crawlers can re-fetch only changed pages.

Optional

  • Sitemap: Complete URL inventory for crawlers.

Pattern 8: Repository structure for engineering teams

For teams maintaining llms.txt as code, use this repo layout:

docs/

llms.txt.template # Hand-written H1, summary, section structure

llms.config.yaml # Section -> URL pattern mappings

scripts/

build-llms-txt.ts # Generates llms.txt and llms-full.txt at build time

validate-llms.ts # Runs llmstxt-validator on output

tests/

llms-snapshot.txt # Golden file for diff review in PRs

This pattern keeps the curated structure under version control while regenerating link lists from the docs source of truth on every deploy.

Validation tooling

  • llmstxt-validator (community CLI): Checks for required H1, summary blockquote, valid Markdown, and link reachability.
  • Mintlify built-in validation: Surfaces malformed sections in the build dashboard.
  • Fern lint: Validates llms.txt as part of the docs build pipeline.
  • Manual smoke test: Fetch the file with curl -H 'Accept: text/markdown' and verify it loads as plain Markdown without HTML wrapping.

Validation is non-negotiable for any docs site over 50 pages. A broken llms.txt fails silently — LLMs will not surface a parse error, they will just stop using the file.

Common mistakes

  • Listing every page. llms.txt is curation, not enumeration. Cap at ~80 entries; let sitemap.xml handle the long tail.
  • Skipping per-link summaries. Bare title-only links waste the highest-leverage retrieval signal.
  • Duplicating content in llms.txt and llms-full.txt. llms.txt should link, llms-full.txt should inline. Mixing them confuses retrieval.
  • Using H4 or deeper headings. The spec implies H2 + Optional + per-link summaries. H3 is tolerated; H4+ is not parsed reliably.
  • Hard-coding absolute URLs in different schemes. Stick to one canonical scheme (HTTPS) and one canonical host. Mixed schemes break some retrievers.
  • Forgetting to ship the file at the site root. llms.txt must be at /llms.txt. Sub-path versions are tolerated but reduce discoverability.

Adoption status reality check

llms.txt is widely implemented — BuiltWith tracked over 844,000 sites in late 2025 — but no major AI vendor has officially confirmed they read the file at scale. Mintlify's traffic logs show OpenAI and Anthropic crawlers fetching llms-full.txt regularly, suggesting practical use even without formal endorsement. John Mueller has stated Google Search does not use llms.txt as a ranking signal. Treat llms.txt as low-cost insurance: implementation takes 2-8 hours, downside risk is near zero, and credible upside exists for AI agent and developer-tool use cases.

FAQ

Keep llms.txt under 10 KB. Beyond that, retrieval becomes lossy and the file starts to feel like content rather than navigation. Move bulk content to llms-full.txt or per-section files.

Q: Should I list every page in llms.txt?

No. Curate to ~50-80 evergreen, canonical pages. Let sitemap.xml handle full enumeration. Listing every page dilutes salience signals.

Q: Do AI crawlers actually read llms.txt?

Mintlify's analytics confirm OpenAI and Anthropic fetch llms-full.txt at regular cadence. Formal vendor confirmation does not exist; treat the file as practical insurance rather than a guaranteed channel.

Q: Can I use H4 or deeper headings in llms.txt?

The llmstxt.org spec defines H1 + H2. H3 is tolerated by mainstream parsers and used in production by Anthropic. H4 and deeper are not parsed reliably; avoid them.

Q: How do llms.txt and llms-full.txt differ?

llms.txt is a navigation index (links + summaries). llms-full.txt inlines the complete Markdown body of every linked page. AI crawlers tend to prefer llms-full.txt because it eliminates retrieval round-trips.

Q: Should llms.txt include sitemap.xml URLs?

Not directly. Reference sitemap.xml as a single link in an Optional section so AI agents can fall back to the full URL inventory, but do not enumerate sitemap URLs in llms.txt.

: llmstxt.org official spec by Jeremy Howard — verified 2026-05-03 — supports base spec structure (H1, blockquote, H2 sections, Optional). https://llmstxt.org/

: Mintlify, "How often do LLMs visit llms.txt?" — verified 2026-05-03 — supports llms-full.txt receives heavier traffic than llms.txt. https://www.mintlify.com/blog/how-often-do-llms-visit-llms-txt

: Mintlify, "Real llms.txt examples from leading tech companies" — verified 2026-05-03 — supports Anthropic/Cloudflare/Stripe/Vercel patterns. https://www.mintlify.com/blog/real-llms-txt-examples

: Mintlify, "What is llms.txt?" updated March 2026 — verified 2026-05-03 — supports per-link summary advice. https://www.mintlify.com/blog/what-is-llms-txt

: Mintlify, "Best llms.txt implementation platforms 2026" — verified 2026-05-03 — supports auto-generation matrix. https://www.mintlify.com/library/best-llms-txt-platforms

: DeployHQ, "How to Make Your Documentation AI-Friendly" — verified 2026-05-03 — supports content-negotiation alternative. https://www.deployhq.com/blog/making-your-documentation-ai-friendly-serving-markdown-to-ai-coding-assistants

: Publii, "The Complete Guide to llms.txt" — verified 2026-05-03 — supports BuiltWith adoption number. https://getpublii.com/blog/llms-txt-complete-guide.html

: Derivatex, "LLMs.txt: The Complete Guide for SEO and AI Search (2026)" — verified 2026-05-03 — supports John Mueller / Google statement. https://derivatex.agency/blog/llms-txt-guide/

: LSEO, "Why Clean Header Hierarchies Are Critical for LLMs" — verified 2026-05-03 — supports heading depth as salience signal. https://lseo.com/join-lseo/why-clean-header-hierarchies-h1-h3-are-critical-for-llms/

Related Articles

reference

AI Crawler IP Allowlist Reference

Reference list of official AI crawler IP range endpoints, user agents, and reverse-DNS verification methods for GPTBot, ClaudeBot, PerplexityBot, Googlebot, and more.

guide

How to Create llms.txt: Step-by-Step Tutorial for AI Search

Step-by-step tutorial for creating, deploying, and validating an llms.txt file so AI systems and LLMs can discover your site's most important content.

specification

Robots.txt for AI Crawlers: Specification & Configuration

Complete robots.txt spec for AI crawlers: GPTBot, ClaudeBot, PerplexityBot, Google-Extended, CCBot directives, syntax rules, and validation pipeline.

Topics
Stay Updated

GEO & AI Search Insights

New articles, framework updates, and industry analysis. No spam, unsubscribe anytime.