llms.txt Reference: Specification, Format, and Examples
llms.txt is a proposed standard, introduced by Jeremy Howard of Answer.AI on September 3, 2024, that places a curated Markdown index at /llms.txt to help LLMs locate and read a site's most important content. Adopters include Stripe, Anthropic, Mintlify, and Cursor.
TL;DR
Add a Markdown file at https://yourdomain.com/llms.txt with an H1 site name, a one-paragraph blockquote summary, and H2 sections that link to your most important pages. Optionally publish a longer llms-full.txt with the same pages expanded inline. The file is a community proposal, not an enforced web standard, and its impact on AI traffic is still under measurement — but it is cheap to ship and improves how LLMs see your information architecture.
Origin and status
llms.txt was proposed by Jeremy Howard at Answer.AI on September 3, 2024. The canonical specification lives at llmstxt.org. It is a community proposal — not an IETF, W3C, or WHATWG standard — but it has been picked up by several documentation platforms and SaaS products since late 2024.
Mintlify added native llms.txt support across its docs platform on November 14, 2024, making the file the default for thousands of customer-hosted documentation sites overnight. Stripe, Anthropic, Cursor, and the Python instructor library followed in the months after. Google itself briefly added llms.txt to its developer documentation in late 2025 and then removed it without comment, signalling that even the major search engines are still figuring out how the file fits into their crawl pipeline.
The version of the spec at llmstxt.org is informal — there are no numbered revisions yet — but the grammar described below has been stable since the initial September 2024 proposal. Treat llms.txt as an emerging convention rather than a guaranteed ranking signal: the cost of publishing one is small, but the upside is still being measured.
For where llms.txt fits among other AI-discovery files, see the GEO technical hub and the robots.txt for AI crawlers reference.
What llms.txt is
llms.txt is a plain-text Markdown file placed at your site's root URL (/llms.txt) that describes your site's content in a format optimized for Large Language Models. While sitemap.xml tells search-engine crawlers which URLs exist, llms.txt tells AI systems what the content is about and which pages matter most.
The file is Markdown so it is both human-readable and machine-parseable. AI clients can fetch it directly with a single HTTP request, follow the links to clean Markdown versions of pages, and skip the navigation bars, ads, JavaScript bundles, and analytics scripts that clutter HTML. For an LLM spending a fixed token budget on your site, this is the difference between fitting your full reference inside its context window and silently truncating after the first 30%.
A typical llms.txt has four parts: an H1 site title, a blockquote site description, one or more H2 sections grouping related pages, and bullet-list link entries with one-sentence descriptions. There is no required schema, no JSON wrapper, and no authentication step — just Markdown served as text/markdown (or text/plain) at a known URL.
Why it matters
LLM context windows are finite, and AI agents spend more compute reading your site than rendering it. Asking a model to ingest an entire HTML page — with its chrome, scripts, styles, third-party widgets, and analytics — wastes tokens and degrades answer quality. llms.txt exists because the cheapest way to make a site readable to an LLM is to stop making the LLM do parsing work it does not need to do.
A well-crafted llms.txt gives the model a short, curated entry point with four things:
- Site description — what the site is, who it serves, and what topics it covers.
- Content index — links to high-value pages with one-line descriptions so the model can pick the right page to read next.
- Navigation structure — how content is grouped into sections, mirroring how a human would browse the site.
- Optional usage policy — how AI systems should attribute or cite the content, and any permissions or restrictions.
The practical payoff is threefold. First, LLMs spend their context budget on substance, not boilerplate, which improves citation accuracy. Second, models that support tool use can fetch llms.txt as a routing layer — read the index, decide which pages to fetch, then pull the Markdown for those pages — which is significantly cheaper than crawling HTML. Third, you gain a single source of truth for "what should an AI know about this site?" that decouples from your visual navigation, your search ranking, and your CMS.
llms.txt is not a ranking signal in the SEO sense. It does not directly cause ChatGPT, Perplexity, or Gemini to cite you more often. What it does is make your site easier and cheaper to read once a model decides to read it — and that, over time, correlates with citation frequency.
How it works: spec and field reference
The llms.txt grammar is intentionally minimal. The minimum valid file is:
# Site NameOne- or two-sentence description of the site.
Section name
- Page Title: Short description of the page.
- Page Title: Short description of the page.
Another section
- Page Title: Short description.
Required fields
| Element | Markdown | Description |
|---|---|---|
| Title | # Site Name | Site or product name as an H1. Exactly one allowed per file. |
| Description | > ... | Blockquote summarizing the site (1-2 sentences). Plain prose, no nested formatting. |
| Sections | ## ... | H2 headings grouping related content. At least one required. |
| Links | - Title: Description | Page entries with absolute URL and a one-line description. |
Optional fields
| Element | Markdown | Description |
|---|---|---|
| Sub-sections | ### ... | H3 headings under an H2 for finer grouping. Some parsers ignore them. |
| Usage policy | Free-form section | How AI systems should cite or attribute content, often under a ## Usage policy heading. |
| Contact | Free-form section | How to reach the maintainers (email, GitHub). |
| Update frequency | Free-form section | How often the file is regenerated. |
| Free-form prose | Any paragraph | The spec allows additional Markdown prose between sections; most parsers preserve it. |
Validation
There is no official validator yet. Practical checks:
- Fetch https://yourdomain.com/llms.txt with curl -I and confirm a 200 OK response with Content-Type: text/markdown or text/plain.
- Run the file through a generic Markdown linter (markdownlint, remark-cli) to catch broken links and malformed list items.
- Spot-check that every - Title: Description line resolves to a real page returning 200.
- Confirm the file fits in your target model's context budget — Anthropic, OpenAI, and Google all currently support 100k+ tokens, but smaller open models cap at 8k-32k.
The llmstxt.org reference parser is written in Python and can be used as a CI gate.
llms.txt vs related files
llms.txt does not replace existing crawl-control or indexing files; it complements them.
| Standard | Purpose | Audience | Format | Required? |
|---|---|---|---|---|
| robots.txt | Allow / disallow crawler paths | All bots | Plain text directives | De facto required |
| sitemap.xml | Enumerate indexable URLs | Search engines | XML | Strongly recommended |
| llms.txt | Curated content index for LLMs | LLM clients | Markdown | Emerging convention |
| llms-full.txt | Full Markdown content of key pages | LLM clients with bigger context | Markdown | Optional companion |
| ai.txt | AI-agent permissions and attribution | AI agents | Plain text directives | Niche / early-stage |
A few implementation notes:
- robots.txt controls whether a crawler may fetch a URL; llms.txt describes what the URL contains. Different questions, both worth answering.
- sitemap.xml is exhaustive and machine-only; llms.txt is curated and human-readable. Most sites need both.
- llms-full.txt is a Mintlify-popularized companion file: it inlines the Markdown of your most important pages so an LLM can ingest the full content in a single fetch. Publish both when documentation depth matters (APIs, SDKs, product references); publish only llms.txt when your site is too large or content is best read page-by-page.
- ai.txt is a separate proposal from Spawning that focuses on opt-out and attribution for generative-AI training. Its adoption is narrower than llms.txt's.
A well-equipped 2026 site exposes robots.txt, sitemap.xml, llms.txt, and (where useful) llms-full.txt. The combined cost is under an hour of engineering work and is fully cacheable behind your CDN.
Practical implementation
1. Create the files
Place llms.txt (and optionally llms-full.txt) in your site's public root, alongside robots.txt and sitemap.xml:
your-site/
└── public/
├── llms.txt
├── llms-full.txt # optional companion
├── robots.txt
└── sitemap.xml
Most static-site generators (Next.js, Astro, Hugo, Eleventy) serve everything in public/ (or static/) directly at the root URL. For server-rendered apps, add an explicit route that returns the file with Content-Type: text/markdown.
2. Generate the content
Three options, in increasing automation:
- Hand-write it. Best for small sites (< 30 pages) where you want full control over which pages the model sees first.
- Generate from your CMS. Most CMS APIs can produce a JSON list of pages with title and description; map that to the - Title: Description format with a 30-line script. Run on every deploy.
- Use a generator tool. Open-source options include Firecrawl (scrapes a live site and emits a draft llms.txt), GitHub Actions for static-site repos, and Mintlify's built-in generator (turn-key for Mintlify-hosted docs).
3. Advertise the file via
The spec does not yet define a discovery mechanism, but a convention documented by Giles Thomas is to add a tag in your HTML
so that crawlers and clients can discover the Markdown alternative:<link rel="alternate" type="text/markdown" title="LLM-friendly version" href="/llms.txt" />This is a no-op for human browsers and a strong hint to AI clients that a curated index exists.
4. Verify accessibility
Fetch https://yourdomain.com/llms.txt directly. The response must be 200 OK, served with Content-Type: text/markdown (or text/plain), under your canonical domain, and not gated by login, CAPTCHA, geo-block, or paywall. Run the same check on llms-full.txt if you publish one.
5. Keep it fresh
Regenerate the file whenever you publish, restructure, or deprecate significant content. A stale index — one that points to deleted pages or ignores your three best new ones — actively hurts trust. A common pattern is to wire generation into your build/CI pipeline so the file is regenerated on every deploy.
Real-world examples
As of early 2026, public /llms.txt files are live on (URLs verified at audit time):
| Site | URL | Notes |
|---|---|---|
| Anthropic | docs.anthropic.com/llms.txt | API + SDK reference index, paired with llms-full.txt |
| Stripe | docs.stripe.com/llms.txt | Linked from main developer docs, regenerated on every API release |
| Mintlify-hosted docs | varies per customer | Default support since Nov 14, 2024 — every Mintlify customer ships one automatically |
| Cursor | cursor.com/llms.txt (varies) | Editor + agent docs, optimized for code-completion models |
| Cloudflare | developers.cloudflare.com/llms.txt | Multi-product docs index across Workers, Pages, R2, KV |
| Vercel | vercel.com/docs/llms.txt (varies) | Framework + deployment docs |
| Instructor (Python) | python.useinstructor.com/llms.txt | Library reference, structured-output API |
A few patterns worth copying:
- Anthropic and Mintlify-hosted sites ship a paired llms.txt (curated index) and llms-full.txt (inline content) so models with large context windows can pull everything in one request.
- Stripe regenerates llms.txt on every API release, keeping it perfectly synchronized with their reference docs.
- Cloudflare groups by product (Workers, Pages, R2, KV) rather than content type, mirroring how a developer would search.
- Cursor focuses its index on code examples and SDK docs, optimizing for code-completion models that fetch the file as part of completion context.
Directories such as llmstxt.directory and directory.llmstxt.cloud track new adopters and are useful starting points for benchmarking against your peers.
Common mistakes
- Treating it as ranking infrastructure. llms.txt is a discovery and parsing aid, not a verified ranking signal. Publishing one will not move you up in ChatGPT or Perplexity citations on its own — it just makes you cheaper to read once a model decides to read you.
- Making it too long. Keep llms.txt curated and under ~2,000 words. If you need to expose full content, publish a separate llms-full.txt instead of bloating the index. Bloated indexes get truncated, and the truncation point is not under your control.
- Using HTML instead of Markdown. The spec requires Markdown. HTML breaks parsers, and many AI clients fall back to ignoring the file entirely if the first line is not a valid # H1.
- Hiding it behind auth or rate limits. It must be publicly fetchable at the root URL with no login, no CAPTCHA, and no aggressive rate limit. AI clients fetch with a single GET and will not retry through OAuth flows.
- Using relative URLs. All link entries should use absolute URLs (https://example.com/page), not relative ones (/page). Some parsers resolve relatives correctly; most do not.
- Letting it go stale. A wrong index is worse than no index. If you cannot commit to regenerating on every deploy, do not publish the file in the first place.
- Forgetting the description sentence. A bare list of links — - Title with no trailing description — gives the model nothing to pick from. Always include the trailing : One sentence about the page.
Effectiveness debate
The industry is still measuring whether llms.txt drives AI citations in practice. A Search Engine Land study of 10 sites published in mid-2025 found that only two saw measurable AI traffic uplift after adoption, and the uplift was confounded by other simultaneous changes (schema markup, sitemap improvements, content rewrites). Google itself briefly added llms.txt to its developer documentation in late 2025 and then removed it within weeks, which observers read as a signal that even the major engines are not yet treating the file as a stable input.
The pragmatic stance: publish llms.txt because it is cheap, low-risk, and demonstrably improves how a model sees your information architecture during a manual prompt — but do not expect it to be a silver bullet for AI search visibility. Treat it as one piece of a GEO content checklist that also includes structured data, clean Markdown rendering, citation-ready content patterns, and a stable canonical URL strategy. Measure with your own AI visibility KPIs — track citations in ChatGPT, Perplexity, Gemini, and Claude before and after publishing — rather than trusting third-party studies on small samples.
FAQ
Q: Is llms.txt an official web standard?
No. It is a proposal authored by Jeremy Howard at Answer.AI in September 2024, hosted at llmstxt.org. It is not ratified by IETF, W3C, WHATWG, or the major search engines. It does have informal adoption from documentation platforms (Mintlify) and SaaS products (Stripe, Anthropic, Cursor, Cloudflare), and reference parsers exist in Python and JavaScript.
Q: Does llms.txt replace sitemap.xml?
No. sitemap.xml exists for search-engine crawlers, lists every indexable URL exhaustively, and is consumed by automated indexing pipelines. llms.txt is a curated, prose-summarized index for LLMs and is consumed by models doing reasoning over your content. They answer different questions and should coexist.
Q: How long should llms.txt be?
Treat the curated llms.txt as an index — usually under 2,000 words. Most adopters keep it under 500 lines. If you need to expose full content, publish a separate llms-full.txt with the same pages inlined. Bloated single-file llms.txt files get truncated by clients with smaller context windows.
Q: What is llms-full.txt?
A complementary file, popularized by Mintlify in November 2024, containing the inline Markdown of your most important pages so an LLM can ingest the full text in one HTTP request. It is optional. Publish it when your documentation depth matters (APIs, SDKs, product reference) and your audience includes agents with large context windows. Skip it when your site is too large to fit in any reasonable context budget.
Q: How do I know if AI systems are reading my llms.txt?
Monitor server logs for GET /llms.txt requests. Common AI user-agents to watch for include GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot, Anthropic-AI, Google-Extended, CCBot, and cohere-ai. Beyond raw logs, also track whether AI answers about your product cite your specific page URLs — that is the ultimate signal that the file is doing work.
Q: Should I include every page in llms.txt?
No. Treat it as a curated guide, not a complete inventory. Include your highest-value pages — canonical references, top tutorials, marketing landing pages with real substance — and skip thin pages, archived content, and tag/category index pages. A llms.txt with 30 well-described links is more useful than one with 3,000 bare entries.
Q: How often should I update llms.txt?
Regenerate on every meaningful release. For documentation sites, that typically means every deploy. For marketing sites, monthly is usually enough. The worst pattern is publishing once and forgetting: a llms.txt that points to deleted pages or omits your three best new articles is a negative signal.
Q: What Content-Type should the file be served with?
text/markdown is preferred per the spec, with text/plain as a widely-accepted fallback. Avoid text/html — some parsers will refuse the file if the MIME type suggests HTML. Most static hosts (Vercel, Netlify, Cloudflare Pages) need an explicit _headers or routing rule to serve .txt files as text/markdown.
Q: Does publishing llms.txt help with traditional SEO?
No direct impact has been documented. Traditional search engines (Google, Bing) primarily consume sitemap.xml and HTML rendered content for ranking. llms.txt targets LLM-based clients (ChatGPT, Claude, Perplexity, Gemini) and AI-augmented search experiences. The two channels are independent — keep both files up to date.
相关文章
What Is GEO? Generative Engine Optimization Defined
GEO (Generative Engine Optimization) is the practice of structuring content so AI search engines retrieve, understand, synthesize, and cite it in generated answers.
ai.txt: AI Agent Access Policy Reference
ai.txt is an emerging root-level file that declares site-wide permissions and attribution rules for AI training, citation, and inference.
How to Create llms.txt: Step-by-Step Tutorial for AI Search
Step-by-step tutorial for creating, deploying, and validating an llms.txt file so AI systems and LLMs can discover your site's most important content.