ai.txt: AI Agent Access Policy Reference
ai.txt is a root-level text file that declares site-wide permissions for AI training, citation, and attribution. It complements robots.txt (crawl control) and llms.txt (AI content map) and remains a voluntary, early-adoption standard.
TL;DR. ai.txt is a plain-text file at the root of a domain that signals to AI systems whether your content can be used for training, cited in answers, or both, and on what attribution terms. The most widely promoted variant is Spawning's ai.txt, which uses an allow/disallow URL-prefix syntax modelled on robots.txt. Adoption is still early and enforcement is voluntary.
Definition
ai.txt is a root-level configuration file (https://yoursite.com/ai.txt) that publishes a site's policy for AI agents and model trainers. Where robots.txt tells crawlers where they may go, ai.txt tells AI systems how they may use what they retrieve — for training, for inference-time citation, or not at all.
The term covers two related approaches:
- Spawning's ai.txt — a permission file read at media-download time that uses allow/disallow URL prefixes to opt content into or out of commercial AI training. Spawning surfaces these signals to partners such as Hugging Face and Stability AI.
- DSL-style proposals — research efforts (for example, the ai.txt: A Domain-Specific Language for Guiding AI Interactions with the Internet paper) extending the file with element-level rules and natural-language directives.
Both treat ai.txt as a voluntary signal, not a legally binding contract.
Why ai.txt matters
LLM-powered search, AI assistants, and content-mining bots increasingly fetch web pages outside the conventional crawler model. robots.txt covers crawl-time access but says little about training-set inclusion, citation behavior, or attribution. ai.txt fills that gap by giving publishers a single, machine-readable place to express:
- whether content may be used to train commercial models,
- whether content may be cited in AI answers,
- how the publisher should be attributed when it is cited,
- which paths are out of scope for any AI use.
For sites investing in GEO, ai.txt is the policy layer that pairs with llms.txt (the content map) and robots.txt (the crawl gate).
How it works
At a high level, an ai.txt file expresses four concerns:
| Intent | What it answers | Example signal |
|---|---|---|
| Identity | Who owns the policy? | site name, contact URL |
| Permission | Can content be used for training or citation? | allow / disallow per path |
| Attribution | How should the publisher be credited? | required / preferred / none |
| Scope | Which paths does the policy cover? | URL prefixes |
The most adopted concrete syntax today is Spawning's allow/disallow style:
ai.txt — Spawning style
User-Agent: *
Disallow: /private/
Allow: /
Additional metadata (contact email, attribution preference, citation policy) can be expressed as comments or as fields in the DSL variant. Because the standard is still consolidating, treat any field set as policy intent rather than enforced contract and pair it with robots.txt directives for crawler-level controls.
ai.txt vs robots.txt vs llms.txt
| Need | robots.txt | llms.txt | ai.txt |
|---|---|---|---|
| Block or allow crawling | Yes | No | No |
| List preferred AI reading order | No | Yes | No |
| Signal training permissions | Partial (per-bot tokens) | No | Yes |
| Signal citation / attribution rules | No | No | Yes |
| Read at | Crawl time | AI inference time | Media download / training time |
Together the three files cover crawl control, AI navigation, and AI usage policy. None of them substitute for licensing terms or legal notices.
Adoption status
As of 2026, ai.txt remains an emerging, voluntary standard:
- Spawning's ai.txt is honored by partners participating in Spawning's API ecosystem; major foundation-model labs have not publicly committed to honoring arbitrary ai.txt files.
- Independent audits of related AI permission files (for example, llms.txt) report that most large crawlers ignore them and rely on robots.txt user-agent tokens such as GPTBot and ClaudeBot instead.
- The DSL-style ai.txt proposals are research-stage and not yet implemented by production crawlers.
Publishers should therefore treat ai.txt as a directional signal plus a defensible record of intent, not as an enforced control.
How to apply ai.txt
- Decide your default posture: open to AI training, restricted, or denied.
- Identify paths that need exceptions (for example, gated content, premium articles, or user-generated areas).
- Generate or hand-author an ai.txt using the Spawning generator or your own template.
- Deploy at https://yoursite.com/ai.txt and confirm it is reachable with curl https://yoursite.com/ai.txt.
- Mirror sensitive rules in robots.txt (User-Agent: GPTBot, User-Agent: ClaudeBot, etc.) so enforcement-capable crawlers also honor them.
- Re-review every quarter and after any major content-licensing change.
For a deeper structural pairing, see How to create llms.txt and the /technical hub for related AI-readiness standards.
Common misconceptions
- "ai.txt is legally binding." It is not. Like robots.txt, it is a voluntary protocol; legal control over training data still depends on copyright and licensing terms.
- "ai.txt replaces robots.txt." It does not. The two files cover different lifecycle stages (crawl vs. training/inference) and should be deployed together.
- "All AI systems read ai.txt." Adoption is partial and concentrated around Spawning's ecosystem. Combine ai.txt with robots.txt user-agent rules for stronger coverage.
FAQ
Q: Is ai.txt an official W3C or IETF standard?
No. ai.txt is a proposed, community-driven standard. The most recognized variant is published by Spawning, with academic DSL extensions explored in research papers. Treat it as voluntary until a formal specification is ratified.
Q: Do major AI labs honor ai.txt today?
Adoption is early and uneven. Spawning's partners surface ai.txt-derived permissions, but independent audits show that most large foundation-model crawlers still rely on robots.txt user-agent tokens. Pair both files for the strongest signal.
Q: Where should ai.txt live?
At the domain root, served as plain text from https://yoursite.com/ai.txt over HTTPS, with Content-Type: text/plain and a 200 status. It must be reachable without authentication.
Q: How is ai.txt different from a license file or terms of service?
ai.txt is a machine-readable policy signal, not a contract. License files and terms of service remain the legal source of truth; ai.txt is the operational hint that AI systems can parse at scale.
Q: Should I use ai.txt if I want to block all AI training?
Yes — declare a deny-all posture in ai.txt and back it up with explicit robots.txt rules for known AI user agents (for example, GPTBot, ClaudeBot, PerplexityBot). The combination maximizes the chance that compliant crawlers respect your policy.
関連記事
What Is GEO? Generative Engine Optimization Defined
GEO (Generative Engine Optimization) is the practice of structuring content so AI search engines retrieve, understand, synthesize, and cite it in generated answers.
How to Create llms.txt: Step-by-Step Tutorial for AI Search
Step-by-step tutorial for creating, deploying, and validating an llms.txt file so AI systems and LLMs can discover your site's most important content.
llms.txt Reference: Specification, Format, and Examples
llms.txt is a proposed root-level Markdown file that gives LLMs a curated, machine-readable index of a site. Reference for spec, format, and adoption.