Geodocs.dev

How to Create llms.txt: Step-by-Step Tutorial for AI Search

ShareLinkedIn

Open this article in your favorite AI assistant for deeper analysis, summaries, or follow-up questions.

This guide walks you through creating, deploying, and validating an llms.txt file for your website. By the end, AI systems will be able to quickly understand what your site contains and where the most important content lives.

To create llms.txt, place a Markdown file at /llms.txt containing your site name (H1), a one-paragraph description (blockquote), and H2 sections of curated page links with one-sentence descriptions. Deploy at the site root, confirm robots.txt allows AI crawlers, and verify the URL returns plain text.

TL;DR

Create /llms.txt as a curated Markdown index of your most important pages. Start with an H1 (site name), a blockquote (one-paragraph description), then 2-5 H2 sections with bulleted links and one-sentence descriptions. Keep the file under ~100 entries, optionally publish a heavier llms-full.txt, and confirm robots.txt does not block AI crawlers from /llms.txt.

Quick start (5 steps)

  1. Create a file named llms.txt in your site's public/root directory.
  2. Add an H1 with your site name and a one-paragraph blockquote description.
  3. Group key URLs into 2-5 H2 sections (Getting started, Core concepts, Reference, etc.) with full URLs and one-sentence descriptions.
  4. Deploy and visit https://yoursite.com/llms.txt to confirm it returns plain text.
  5. Confirm robots.txt allows GPTBot, ClaudeBot, PerplexityBot, and OAI-SearchBot, then add the file to your CI/CD so it ships on every release.

That is the minimum viable llms.txt. The rest of this guide explains each step in depth, plus deployment patterns, validation, and common mistakes to avoid.

Prerequisites

Before you start, you should have:

  • Write access to your site's root or public/ directory, or to a deployment pipeline that produces it.
  • A text editor that handles UTF-8 and Markdown cleanly.
  • A short list of your most important pages — typically 20-100 URLs that represent your site's canonical knowledge surface.
  • Basic Markdown familiarity (headings, links, blockquotes, lists).
  • Access to robots.txt and any CDN/edge configuration, so you can confirm AI crawlers are allowed and the file is served as text/plain.
  • A versioning plan — llms.txt should be regenerated whenever your information architecture changes meaningfully.

If your site is generated by a static site generator (Next.js, Astro, Gatsby, Hugo, Docusaurus), you can keep llms.txt in source control and let your build pipeline output it like any other static asset.

Full spec walkthrough

The llms.txt proposal at llmstxt.org (Jeremy Howard, September 2024) defines a small, opinionated structure. Adhering to it makes your file easy for both humans and LLM toolchains to parse.

File location and content type

  • Path: /llms.txt at the root of the host (e.g., https://example.com/llms.txt).
  • Content type: text/plain; charset=utf-8.
  • Format: Markdown. No HTML, no JSON, no front matter.

Required and optional sections

The spec defines this top-down structure:

  1. H1 — site name (required). The first line is # Site Name. This is the canonical identity used by LLM clients to label the file.
  2. Blockquote — description (required). Immediately below the H1, a single blockquote (> ...) summarizes what the site is, who it serves, and what kind of content lives there. One short paragraph is ideal.
  3. Free prose (optional). Any number of paragraphs of context — guidance about how to read the index, license/usage notes, etc. Keep it terse.
  4. H2 sections (recommended). Each H2 introduces a content area (Getting started, API reference, Concepts). Under each H2, a bulleted list of links with the format - Title: one-sentence description.
  5. Optional ## Optional section. The spec allows a final ## Optional section for links the model may skip if context is tight (changelogs, deep cuts, archive). LLM clients are expected to drop this section first when truncating.

llms-full.txt companion

The proposal allows an optional llms-full.txt at the same root, containing the full Markdown text of the pages referenced in llms.txt. This is heavier and not necessary for most sites — publish it only if your content is small enough to ship in one file (typical cap: a few hundred KB) or if you want to make a self-contained corpus available to LLM clients.

What llms.txt is not

  • It is not robots.txt. It does not control crawl permissions.
  • It is not a sitemap. It is curated, not exhaustive, and includes prose context.
  • It is not an API. It is a static text file.

Valid example: full llms.txt and llms-full.txt

Here is a complete, spec-compliant llms.txt:

markdown

Acme provides a REST API for building payment integrations. These docs cover authentication, endpoints, webhooks, SDKs, and operational guidance for production deployments.

This index lists the canonical entry points into the Acme documentation. Pages marked under "Optional" can be skipped if context is tight.

Getting started

  • Quick Start: Set up your first payment integration in 5 minutes.
  • Authentication: API key setup and OAuth 2.0 configuration.

Core API

  • Payments: Create, capture, and refund payments.
  • Webhooks: Receive real-time event notifications.
  • Error Handling: Error codes, retry logic, and debugging.

SDKs

  • Node.js SDK: JavaScript/TypeScript library for server-side.
  • Python SDK: Python library for backend integrations.

Optional

A minimal llms-full.txt shares the same header but inlines the full Markdown body of each page:

# Acme Developer Docs

Acme provides a REST API for building payment integrations.

Quick Start

URL: https://docs.acme.com/quickstart

(full page Markdown here)

Authentication

URL: https://docs.acme.com/auth

(full page Markdown here)

The pattern is: top-level H1 + blockquote for the site, then one H1 per page with a > URL: blockquote, followed by the page's Markdown.

Common invalid examples

These shapes break the spec and confuse LLM toolchains.

Missing H1. A file that opens with prose or H2 has no canonical identity:

This is the docs for Acme.
## Getting started

Relative URLs. Resolution requires the LLM client to know the host:

- [Quick Start](/quickstart): Set up in 5 minutes.

HTML instead of Markdown. llms.txt is Markdown only:

<h1>Acme Docs</h1>
<ul><li><a href="...">Quick Start</a></li></ul>

Front matter at the top. The spec has no YAML front matter; it interferes with parsing:

---
title: Acme
---
# Acme

Sitemap dump. Hundreds of URLs with no curation defeats the purpose:

# Acme
- [/blog/1](https://acme.com/blog/1): Blog 1
- [/blog/2](https://acme.com/blog/2): Blog 2
... (500 more)

Aim for fewer than ~100 entries; everything else belongs in your sitemap.

Deployment patterns

How you ship llms.txt depends on your stack. Below are the four most common patterns.

Next.js (App Router or Pages Router)

Place the file in public/:

your-project/

├── public/

│ ├── llms.txt

│ └── llms-full.txt

Next.js serves public/ assets at the root. https://example.com/llms.txt will return the file with Content-Type: text/plain; charset=utf-8. No additional routing required. If you need a dynamic version (generated from MDX), add a route handler at app/llms.txt/route.ts that returns the assembled Markdown with Content-Type: text/plain.

Mintlify and other docs platforms

Mintlify, Docusaurus, and similar documentation platforms generate llms.txt and llms-full.txt automatically from your content tree. For Mintlify, the files are exposed by default at /llms.txt and /llms-full.txt once you publish; you can override them by adding llms.txt to your repo root. Docusaurus has a community plugin (docusaurus-plugin-llms-txt) that emits the files at build time. ReadMe and GitBook expose similar features under their AI/discovery settings.

Cloudflare Pages and Workers

For static sites on Cloudflare Pages, drop llms.txt at the root of your output directory; Pages will serve it directly. For dynamic generation, use a Cloudflare Worker route that responds to /llms.txt with the assembled Markdown and Content-Type: text/plain. Cloudflare's own developer documentation publishes a /llms.txt (and /llms-full.txt) using exactly this pattern, generated from their docs source.

Static sites and custom servers

For Hugo, Astro, Eleventy, or hand-rolled static sites, copy llms.txt into the build output root. For Apache, ensure mime.types maps .txt to text/plain. For Nginx, add:

location = /llms.txt {
    add_header Content-Type "text/plain; charset=utf-8";
    try_files $uri =404;
}

For WordPress, upload via SFTP/SSH to the document root (same level as wp-config.php) or use a plugin that manages root-level files. Avoid serving llms.txt through a PHP rewrite if you can — a static file is faster and more reliable.

Validation checklist (14 items)

After deploying, work through every item:

  1. Reachability. curl -I https://example.com/llms.txt returns 200 OK.
  2. Content type. Response header shows Content-Type: text/plain; charset=utf-8.
  3. Encoding. File is UTF-8 with no BOM.
  4. H1 present. First non-empty line is # Site Name.
  5. Blockquote present. Immediately after the H1, a > ... blockquote describes the site.
  6. Absolute URLs. Every link uses https://..., no relative paths.
  7. One-sentence descriptions. Each link has a single, specific description.
  8. Section count. 2-7 H2 sections; remove or merge if more.
  9. Entry count. Total entries under ~100; trim aggressively.
  10. robots.txt. robots.txt allows /llms.txt for GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, Google-Extended, Applebot-Extended.
  11. No HTML / front matter. The file is pure Markdown.
  12. Optional section last. If you use ## Optional, it is the final section.
  13. Companion file (if used). llms-full.txt is also reachable, also text/plain, also UTF-8.
  14. Build pipeline. The file is regenerated on every deploy; no stale snapshots.

A file that passes all 14 checks is well-formed and ready to be referenced by AI clients.

Testing in production

Validation gets you a correct file. Production testing tells you whether AI systems are actually using it.

  • Direct fetch in your CDN logs. Filter access logs for /llms.txt and the AI user agents (GPTBot, ChatGPT-User, ClaudeBot, PerplexityBot, OAI-SearchBot, Google-Extended, Applebot-Extended). Volume is the simplest signal that fetches are happening.
  • Crawler simulation. Use curl -A 'GPTBot/1.0' https://example.com/llms.txt and the equivalent for ClaudeBot/PerplexityBot to confirm none of your edge rules block these UAs.
  • AI-mode probes. Periodically ask ChatGPT, Perplexity, Claude, and Gemini direct questions about your site ("What does Acme cover?") and inspect citations. If a system consistently cites pages from your llms.txt index, you have indirect evidence of value.
  • Diff on every deploy. Keep llms.txt in source control and review diffs in pull requests; an unintended deletion can silently shrink your AI surface.
  • Set a review cadence. Re-audit llms.txt quarterly — at minimum when you launch new content pillars or restructure information architecture.

Common mistakes

  • Including every single page. llms.txt should be curated, not exhaustive. Keep it under ~100 entries; everything else belongs in sitemap.xml.
  • Using relative URLs. Always use full, absolute URLs.
  • Vague descriptions. "An article about our product" tells AI nothing. Write specific, intent-bearing descriptions.
  • Forgetting to update. Stale files reduce trust over time. Wire llms.txt into your build pipeline.
  • Blocking AI crawlers in robots.txt. If GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, Google-Extended, or Applebot-Extended cannot reach the file, the file does nothing.
  • Front matter or HTML. The spec is pure Markdown. Front matter and HTML break parsers.
  • Skipping llms-full.txt when your site is small. If your full corpus fits, publishing llms-full.txt is a cheap upgrade.
  • Treating llms.txt as a substitute for structured data. It complements JSON-LD and sitemaps; it does not replace them.

FAQ

Q: Is llms.txt officially supported by AI providers?

A: It is an emerging convention proposed at llmstxt.org in September 2024 by Jeremy Howard. As of early 2026, no major AI provider (OpenAI, Anthropic, Google, Perplexity) has formally committed to consuming it. Adoption is voluntary and cross-vendor; treat it as low-cost forward-compatibility, not a guaranteed ranking signal.

Q: How is llms.txt different from robots.txt?

A: robots.txt is an access-control file — it tells crawlers which paths they may fetch. llms.txt is a discovery file — it tells LLM clients which content matters and how it is organized. They are complementary. You still need robots.txt to allow AI user agents in the first place, and you still need llms.txt to give them a curated entry point once they are allowed.

Q: Should I publish both llms.txt and llms-full.txt?

A: Publish llms.txt always. Publish llms-full.txt if your content is compact enough to ship in a single file (typically under a few hundred KB) and you want to make a self-contained corpus available. For large sites, skip llms-full.txt and rely on llms.txt linking to canonical pages.

Q: How often should I update llms.txt?

A: Whenever your information architecture changes meaningfully — new pillar pages, restructured navigation, deprecated sections. Many teams regenerate it on every deploy from a config file or content tree, which avoids drift entirely.

Q: What if my site has thousands of pages?

A: Curate aggressively. Pick the 50-100 pages that best represent your canonical knowledge surface. Use ## Optional for next-tier pages. Anything beyond that belongs in sitemap.xml, not llms.txt.

Q: Does llms.txt help with traditional SEO?

A: Indirectly. llms.txt is aimed at LLM clients and AI search systems, not Googlebot. However, the discipline it imposes — curating canonical URLs and writing crisp descriptions — tends to improve internal linking, sitemap quality, and structured data, all of which have classical SEO benefits.

Q: How do I serve llms.txt from a subdirectory or subdomain?

A: The spec assumes host-level placement at /llms.txt. If you must serve from a subdomain (e.g., docs.example.com/llms.txt), publish one per host. Subdirectory placement (e.g., /docs/llms.txt) is non-standard and unlikely to be discovered by AI clients.

Q: Can I block specific AI providers from reading llms.txt?

A: Yes — at the robots.txt level. Disallow specific UAs from /llms.txt (or from your site entirely) if you do not want a particular provider to ingest your content. Note that user-agent compliance is voluntary.

Sources

  • llmstxt.org — verified 2026-05-01 — canonical proposal and format spec.
  • Answer.AI announcement — verified 2026-05-01 — original rationale and design.
  • Cloudflare Developer Docs llms.txt — verified 2026-05-01 — production reference implementation.
  • Mintlify llms.txt support docs — verified 2026-05-01 — out-of-the-box generation pattern.

関連記事

guide

What Is GEO? Generative Engine Optimization Defined

GEO (Generative Engine Optimization) is the practice of structuring content so AI search engines retrieve, understand, synthesize, and cite it in generated answers.

reference

AI Crawl Signals: How AI Discovers Content

Technical reference for the signals AI systems use to discover, access, and prioritize web content — including sitemaps, llms.txt, robots.txt, structured data, and HTTP headers.

reference

ai.txt Starter Template: Copy-Ready AI Access Policy File

A copy-ready ai.txt starter template for declaring AI crawler access policies, attribution requirements, and content licensing terms.

最新情報

GEO&AI検索インサイト

新しい記事、フレームワーク更新、業界分析。スパムなし、いつでも解除可能。