Geodocs.dev

LLMs.txt Generator Tools: Evaluation checklist + best options (2026)

ShareLinkedIn

Open this article in your favorite AI assistant for deeper analysis, summaries, or follow-up questions.

Use this checklist to evaluate llms.txt generators on sitemap coverage, llms-full.txt support, per-section descriptions, and build-pipeline integration. Mintlify, FireCrawl, and llmstxt.dev are the most-used 2026 options.

TL;DR

llms.txt is a markdown file that tells AI agents what your site contains, where the canonical content lives, and how to navigate it. A good generator (1) crawls your full sitemap, (2) emits both llms.txt and llms-full.txt, (3) lets you write per-section descriptions, and (4) regenerates on every deploy.

What is llms.txt?

llms.txt is a proposed standard hosted at /llms.txt that gives LLMs a curated map of a site's most-citable content. The companion llms-full.txt includes the actual markdown content of those pages so AI agents can ingest without crawling each URL. Adoption accelerated through 2025-2026 as Anthropic, Mintlify, and Vercel began shipping first-class support.

Evaluation checklist

Use the items below as a yes/no checklist when comparing generators.

Coverage

  • [ ] Reads your full sitemap.xml automatically
  • [ ] Supports manual page lists for sites without sitemaps
  • [ ] Excludes draft, archived, and noindex pages
  • [ ] Respects custom include/exclude rules

Output

  • [ ] Emits llms.txt (curated index)
  • [ ] Emits llms-full.txt (full markdown content)
  • [ ] Supports per-section descriptions in markdown
  • [ ] Preserves heading hierarchy and frontmatter

Build pipeline

  • [ ] CLI command for CI/CD
  • [ ] GitHub Action available
  • [ ] Vercel/Netlify integration
  • [ ] Build-fail option when content drops below a threshold

Quality controls

  • [ ] Word-count diff between runs
  • [ ] Validation against the llms.txt spec
  • [ ] Diff preview before commit
  • [ ] Ability to override page titles

Maintenance

  • [ ] Project actively maintained (commits in last 90 days)
  • [ ] Issue response within 30 days
  • [ ] Public roadmap or changelog

Hosting

  • [ ] Generator self-hosted or hosted SaaS
  • [ ] Output is plain text (no DRM, no auth)
  • [ ] CDN-friendly (no cookies, no dynamic params)

Best options in 2026

Mintlify

If your docs are on Mintlify, native llms.txt generation runs on every deploy. Coverage and llms-full.txt are first-class. Best fit for product docs and developer-experience teams.

FireCrawl

Open-source crawler that ingests any site (sitemap or seed URL) and outputs llms.txt + llms-full.txt. Best for teams whose primary content is not on a docs platform (marketing sites, blogs, knowledge bases).

llmstxt.dev

A hosted SaaS generator that runs on a schedule and exposes llms.txt/llms-full.txt over a CDN endpoint. Best for non-engineering content teams that need zero-code setup.

Vercel native (/llms.txt route)

If you ship Next.js on Vercel, you can colocate an llms.txt route generator with your build, regenerate per deploy, and avoid third-party dependencies.

Quality bar your output should meet

  • Top 100 most-citable pages, sorted by hub importance
  • Each entry: title, canonical URL, and 1-2 sentence description
  • Sections grouped by content type (# Guides, # References, # Case studies)
  • llms-full.txt must match llms.txt URL set 1:1
  • File size under 1MB for llms.txt; llms-full.txt may be larger

How to apply

  1. Audit current site coverage and pick a generator that fits your stack.
  2. Run the generator locally and review output against the checklist.
  3. Wire it into CI/CD so llms.txt regenerates on every content change.
  4. Submit llms.txt URL to Perplexity and Anthropic where supported.
  5. Validate quarterly: re-crawl in llmstxt.dev validator or equivalent.

FAQ

Q: Is llms.txt a real standard?

It is a proposed convention with strong adoption in 2025-2026 (Mintlify, Anthropic, Vercel) but not yet a formal IETF standard. AI engines that respect it treat it as a hint, not a directive.

Q: Do I need both llms.txt and llms-full.txt?

For docs and knowledge bases, yes — llms-full.txt lets agents ingest content without crawling each URL. For marketing sites, llms.txt alone is often enough.

Q: Where do I put llms.txt?

At the root: https://yourdomain.com/llms.txt. Some publishers also expose /llms-full.txt and /.well-known/llms.txt for forward compatibility.

Q: Will llms.txt replace robots.txt?

No. robots.txt controls crawl behavior; llms.txt describes content for ingestion. They serve different layers of the AI agent stack.

Q: Do search engines read llms.txt?

Classical search engines do not use it for ranking. AI engines and AI agents may use it as a content map for retrieval and citation.

Related Articles

specification

Agent Citation Attribution Specification: Verifiable Source Tracking for Autonomous AI Agents

Specification defining HTTP headers, provenance manifests, and chain-of-citation markup so autonomous AI agents produce verifiable citations to source content.

specification

Browser Agent Crawl Etiquette: A Specification for Polite Autonomous AI Browsing

A specification defining how browser-based AI agents should identify themselves, throttle requests, and respect publisher signals to maintain citation trust.

reference

MCP Server Design for Content Publishers and Docs Teams

MCP server design patterns for content publishers: how to expose articles, search, and citation manifests to AI agents via Model Context Protocol.

Stay Updated

GEO & AI Search Insights

New articles, framework updates, and industry analysis. No spam, unsubscribe anytime.