Geodocs.dev

ai.txt: AI Agent Access Policy Reference

ShareLinkedIn

Open this article in your favorite AI assistant for deeper analysis, summaries, or follow-up questions.

ai.txt is a root-level text file that declares site-wide permissions for AI training, citation, and attribution. It complements robots.txt (crawl control) and llms.txt (AI content map) and remains a voluntary, early-adoption standard.

TL;DR. ai.txt is a plain-text file at the root of a domain that signals to AI systems whether your content can be used for training, cited in answers, or both, and on what attribution terms. The most widely promoted variant is Spawning's ai.txt, which uses an allow/disallow URL-prefix syntax modelled on robots.txt. Adoption is still early and enforcement is voluntary.

Definition

ai.txt is a root-level configuration file (https://yoursite.com/ai.txt) that publishes a site's policy for AI agents and model trainers. Where robots.txt tells crawlers where they may go, ai.txt tells AI systems how they may use what they retrieve — for training, for inference-time citation, or not at all.

The term covers two related approaches:

  1. Spawning's ai.txt — a permission file read at media-download time that uses allow/disallow URL prefixes to opt content into or out of commercial AI training. Spawning surfaces these signals to partners such as Hugging Face and Stability AI.
  2. DSL-style proposals — research efforts (for example, the ai.txt: A Domain-Specific Language for Guiding AI Interactions with the Internet paper) extending the file with element-level rules and natural-language directives.

Both treat ai.txt as a voluntary signal, not a legally binding contract.

Why ai.txt matters

LLM-powered search, AI assistants, and content-mining bots increasingly fetch web pages outside the conventional crawler model. robots.txt covers crawl-time access but says little about training-set inclusion, citation behavior, or attribution. ai.txt fills that gap by giving publishers a single, machine-readable place to express:

  • whether content may be used to train commercial models,
  • whether content may be cited in AI answers,
  • how the publisher should be attributed when it is cited,
  • which paths are out of scope for any AI use.

For sites investing in GEO, ai.txt is the policy layer that pairs with llms.txt (the content map) and robots.txt (the crawl gate).

How it works

At a high level, an ai.txt file expresses four concerns:

IntentWhat it answersExample signal
IdentityWho owns the policy?site name, contact URL
PermissionCan content be used for training or citation?allow / disallow per path
AttributionHow should the publisher be credited?required / preferred / none
ScopeWhich paths does the policy cover?URL prefixes

The most adopted concrete syntax today is Spawning's allow/disallow style:

ai.txt — Spawning style

User-Agent: *

Disallow: /private/

Allow: /

Additional metadata (contact email, attribution preference, citation policy) can be expressed as comments or as fields in the DSL variant. Because the standard is still consolidating, treat any field set as policy intent rather than enforced contract and pair it with robots.txt directives for crawler-level controls.

ai.txt vs robots.txt vs llms.txt

Needrobots.txtllms.txtai.txt
Block or allow crawlingYesNoNo
List preferred AI reading orderNoYesNo
Signal training permissionsPartial (per-bot tokens)NoYes
Signal citation / attribution rulesNoNoYes
Read atCrawl timeAI inference timeMedia download / training time

Together the three files cover crawl control, AI navigation, and AI usage policy. None of them substitute for licensing terms or legal notices.

Adoption status

As of 2026, ai.txt remains an emerging, voluntary standard:

  • Spawning's ai.txt is honored by partners participating in Spawning's API ecosystem; major foundation-model labs have not publicly committed to honoring arbitrary ai.txt files.
  • Independent audits of related AI permission files (for example, llms.txt) report that most large crawlers ignore them and rely on robots.txt user-agent tokens such as GPTBot and ClaudeBot instead.
  • The DSL-style ai.txt proposals are research-stage and not yet implemented by production crawlers.

Publishers should therefore treat ai.txt as a directional signal plus a defensible record of intent, not as an enforced control.

How to apply ai.txt

  1. Decide your default posture: open to AI training, restricted, or denied.
  2. Identify paths that need exceptions (for example, gated content, premium articles, or user-generated areas).
  3. Generate or hand-author an ai.txt using the Spawning generator or your own template.
  4. Deploy at https://yoursite.com/ai.txt and confirm it is reachable with curl https://yoursite.com/ai.txt.
  5. Mirror sensitive rules in robots.txt (User-Agent: GPTBot, User-Agent: ClaudeBot, etc.) so enforcement-capable crawlers also honor them.
  6. Re-review every quarter and after any major content-licensing change.

For a deeper structural pairing, see How to create llms.txt and the /technical hub for related AI-readiness standards.

Common misconceptions

  • "ai.txt is legally binding." It is not. Like robots.txt, it is a voluntary protocol; legal control over training data still depends on copyright and licensing terms.
  • "ai.txt replaces robots.txt." It does not. The two files cover different lifecycle stages (crawl vs. training/inference) and should be deployed together.
  • "All AI systems read ai.txt." Adoption is partial and concentrated around Spawning's ecosystem. Combine ai.txt with robots.txt user-agent rules for stronger coverage.

FAQ

Q: Is ai.txt an official W3C or IETF standard?

No. ai.txt is a proposed, community-driven standard. The most recognized variant is published by Spawning, with academic DSL extensions explored in research papers. Treat it as voluntary until a formal specification is ratified.

Q: Do major AI labs honor ai.txt today?

Adoption is early and uneven. Spawning's partners surface ai.txt-derived permissions, but independent audits show that most large foundation-model crawlers still rely on robots.txt user-agent tokens. Pair both files for the strongest signal.

Q: Where should ai.txt live?

At the domain root, served as plain text from https://yoursite.com/ai.txt over HTTPS, with Content-Type: text/plain and a 200 status. It must be reachable without authentication.

Q: How is ai.txt different from a license file or terms of service?

ai.txt is a machine-readable policy signal, not a contract. License files and terms of service remain the legal source of truth; ai.txt is the operational hint that AI systems can parse at scale.

Q: Should I use ai.txt if I want to block all AI training?

Yes — declare a deny-all posture in ai.txt and back it up with explicit robots.txt rules for known AI user agents (for example, GPTBot, ClaudeBot, PerplexityBot). The combination maximizes the chance that compliant crawlers respect your policy.

相关文章

guide

What Is GEO? Generative Engine Optimization Defined

GEO (Generative Engine Optimization) is the practice of structuring content so AI search engines retrieve, understand, synthesize, and cite it in generated answers.

guide

How to Create llms.txt: Step-by-Step Tutorial for AI Search

Step-by-step tutorial for creating, deploying, and validating an llms.txt file so AI systems and LLMs can discover your site's most important content.

reference

llms.txt Reference: Specification, Format, and Examples

llms.txt is a proposed root-level Markdown file that gives LLMs a curated, machine-readable index of a site. Reference for spec, format, and adoption.

保持更新

GEO与AI搜索洞察

新文章发布、框架更新及行业分析。绝无垃圾邮件,可随时取消订阅。