TDMRep (TDM Reservation Protocol) Spec for AI Crawlers
TDMRep (TDM Reservation Protocol) is a W3C Community Group Report that defines a machine-readable way for publishers to signal text-and-data-mining permissions to AI crawlers via three transports — a /.well-known/tdmrep.json file, a TDM-Reservation HTTP response header, and an HTML meta tag — providing legal grounding for AI-training opt-outs aligned with EU CDSM Directive 2019/790 Article 4.
TL;DR
- TDMRep (TDM Reservation Protocol) is a W3C Community Group Report defining machine-readable AI text-and-data-mining (TDM) reservations.
- Three binding transports: /.well-known/tdmrep.json, the TDM-Reservation HTTP response header, and an HTML meta tag.
- tdm-reservation: 0 = open / 1 = reserved; when reserved, tdm-policy URL points to licensing terms.
- Scope is training and data mining (distinct from robots.txt indexing scope); aligned with EU CDSM Directive 2019/790 Article 4 opt-out rights.
Definition
The TDM Reservation Protocol (TDMRep) is a W3C Community Group Report from the W3C TDM Community Group that defines how publishers can signal text-and-data-mining reservations in a machine-readable form (W3C TDM CG, 2024). The protocol gives publishers a structured way to say "this content is reserved from text-and-data mining" or "this content is open for TDM" to any compliant AI crawler or training pipeline.
TDMRep is policy-layer infrastructure, not access control. A reserved resource is still reachable by HTTP fetch; the reservation tells well-behaved AI crawlers that the publisher does not consent to text-and-data mining of the content. Compliant crawlers (and downstream training pipelines) honor the reservation; non-compliant ones ignore it. The reservation provides a legal anchor for enforcement, not a technical block.
The protocol scope is TDM — the act of automatically extracting and analyzing text and data — which is distinct from indexing for search. A page can be open for indexing (no robots.txt block) and reserved for TDM (tdmrep). The two policies live at different layers and AI crawlers are increasingly expected to read both.
Why this matters
TDMRep matters for three reasons.
First, it provides legal grounding. EU CDSM Directive 2019/790 Article 4 created a TDM opt-out right for rightsholders, but the right is only enforceable if expressed in a machine-readable form (EU CDSM Directive 2019/790). TDMRep is the canonical machine-readable expression. Publishers without a TDMRep deployment cannot easily exercise the Article 4 opt-out; publishers with one have a documentary trail.
Second, it disambiguates AI crawler intent. robots.txt and noai meta directives mix concerns — indexing, training, snippet generation, retrieval. TDMRep explicitly addresses TDM and training, which is the use case rightsholders have most concern about. AI crawlers that honor TDMRep can train on permitted corpora without legal ambiguity.
Third, it converges with adjacent IETF work. The IETF AI-Preferences working group is developing a complementary protocol that aligns with TDMRep semantics (IETF AI-Preferences WG). Publishers who deploy TDMRep now position themselves for the cross-protocol layer that is consolidating around the same opt-out semantics.
How it works
TDMRep defines three transports for the same reservation signal. Publishers can use one or all three; AI crawlers MAY check any of them and MUST honor the most restrictive value found.
flowchart LR
A["AI crawler request"] --> B{"tdmrep.json / TDM-Reservation header / meta?"}
B -- "absent or 0" --> C["Train freely"]
B -- "1 (reserved)" --> D["Fetch tdm-policy URL"]
D --> E{"License granted?"}
E -- "yes" --> C
E -- "no" --> F["Skip / log refusal"]Transport 1: /.well-known/tdmrep.json
A JSON document at /.well-known/tdmrep.json describes the reservations for the entire site or for path-scoped subsets. Example:
[
{
"location": "/",
"tdm-reservation": 1,
"tdm-policy": "https://example.com/ai-licensing"
},
{
"location": "/blog",
"tdm-reservation": 0
}
]The array allows per-path reservations; the most specific matching location wins. tdm-reservation: 1 means reserved (TDM not consented); tdm-reservation: 0 means open. When reserved, the optional tdm-policy URL points to the licensing terms a TDM consumer must follow to obtain consent.
Transport 2: TDM-Reservation HTTP response header
A response header carries the same value at the resource level:
TDM-Reservation: 1
TDM-Policy: https://example.com/ai-licensing
The header takes precedence over the JSON document for the specific resource it accompanies. Use it when reservations vary at the resource level rather than the path level.
Transport 3: HTML meta tag
For pages where header injection is impractical, an HTML meta tag carries the same signal:
<meta name="tdm-reservation" content="1">
<meta name="tdm-policy" content="https://example.com/ai-licensing">This transport is convenient for static-site generators that cannot easily set response headers but can emit per-page meta tags.
Resolution order
When multiple transports are present, compliant crawlers resolve in this priority order: header > meta > JSON document. The most restrictive value among matching sources wins, so a TDM-Reservation: 1 header on a resource overrides a tdm-reservation: 0 JSON entry for the same path.
Practical application
Deployment patterns by site type:
Static site (Next.js, Astro, Hugo, Jekyll) — publish /.well-known/tdmrep.json from the public asset folder. Per-path reservations are easiest to express in the JSON document because static sites cannot set per-resource headers without an edge-runtime layer. Add the HTML meta tag to the layout for redundancy.
CDN-fronted site (Cloudflare, Fastly, Vercel) — set the TDM-Reservation and TDM-Policy response headers at the edge, scoped per path or per content type. Cloudflare Workers, Fastly VCL, or Vercel Edge Middleware are the natural injection points.
Dynamic site (Rails, Django, Express) — attach the response headers in your application middleware. Read the per-path reservation from a configuration file or database so editorial teams can update reservations without redeploying.
Path scoping — the location field accepts URL path patterns. Use it to reserve specific sections (/premium, /research) while opening others (/blog, /docs). Rule of thumb: reserve content that costs money to produce; open content that is acquisition or top-of-funnel.
Validation — fetch your /.well-known/tdmrep.json and the TDM-Reservation header on a representative URL with curl -i. Most W3C TDM CG resources include reference validators; run yours through them before relying on the deployment.
Common mistakes
- Conflating TDMRep with robots.txt. robots.txt is about crawl access; TDMRep is about TDM consent. A page can be crawlable and TDM-reserved simultaneously, and AI crawlers are increasingly expected to read both.
- Reserved without tdm-policy URL. Reserving without naming a licensing path leaves the AI crawler with no way to obtain consent. Always include tdm-policy if you would license the content under any terms.
- Path-pattern collisions. Conflicting location patterns produce undefined behavior. Order entries from most specific to least specific and validate with a representative URL set.
- Forgetting the meta tag on dynamic pages. Cached pages served from a static layer may miss the response header. Backstop with the HTML meta tag.
- Treating it as a technical block. TDMRep is a policy signal, not access control. Non-compliant crawlers ignore it. Pair with X-Robots-Tag noai or robots.txt blocks where you need access control too.
FAQ
Q: Is TDMRep legally binding?
TDMRep itself is a W3C Community Group Report, not a binding standard (W3C TDM CG). The legal force comes from regional copyright law that recognizes machine-readable TDM opt-outs — most prominently EU CDSM Directive 2019/790 Article 4 (EU CDSM Directive 2019/790). In jurisdictions that recognize TDM opt-outs, a TDMRep deployment can be the documentary form that exercises the right.
Q: Do GPTBot, ClaudeBot, and PerplexityBot honor TDMRep?
Vendor support has been growing through 2024-2026. Publishers should monitor each crawler's policy disclosures and not assume universal compliance (OpenAI bots reference). For maximum coverage, deploy TDMRep alongside robots.txt User-Agent blocks and X-Robots-Tag noai — the combination addresses both compliant and non-compliant crawlers.
Q: How is TDMRep different from robots.txt and ai.txt?
robots.txt addresses crawl access ("may you fetch this page"). ai.txt is a proposed manifest covering AI training and licensing at the site level. TDMRep specifically addresses TDM consent in a machine-readable form aligned with copyright-law opt-out regimes. The three are complementary and a complete deployment uses all of them.
Q: How do I deploy tdmrep.json on a static site?
Place the JSON file at /.well-known/tdmrep.json in your public asset folder so the build output includes it at the correct path. Most static-site generators (Next.js, Astro, Hugo, Jekyll) treat /.well-known/ as a passthrough directory. Validate the served file with curl -i https://example.com/.well-known/tdmrep.json.
Q: Can TDMRep set per-path reservations via the location pattern?
Yes — the location field in each JSON entry accepts a URL path pattern, and entries are evaluated most-specific-first. Reserve premium or paid-content paths and open marketing or top-of-funnel paths in the same document. For per-resource reservations, prefer the response header transport.
Q: What does the tdm-policy URL contain?
A human-readable description of the licensing terms under which TDM is permitted, plus optional machine-readable contact and rights information. The URL gives a TDM consumer a path to obtain consent without negotiating per-publisher terms by hand. Without it, a reserved resource is effectively closed to TDM.
Q: Does TDMRep require HTTPS?
TDMRep does not strictly require HTTPS, but in practice all three transports rely on TLS-secured channels for integrity. Serving /.well-known/tdmrep.json over plain HTTP exposes the reservation to in-flight tampering. Always serve over HTTPS.
Q: How does TDMRep interact with X-Robots-Tag noai and the IETF AI-Preferences draft?
X-Robots-Tag noai is a per-resource indexing/snippet control; TDMRep is a TDM consent signal. The IETF AI-Preferences working group is developing a unified protocol that aligns with TDMRep semantics (IETF AI-Preferences WG). Deploying TDMRep now positions a publisher for the convergence; in the meantime, layer all three signals for maximum compliant-crawler coverage.
Related Articles
404 Page AI Crawler Handling: Avoiding Citation Loss During Migrations
Migration playbook for keeping AI citations during URL changes — hard 404 vs soft 404, 410 Gone, redirect chains, sitemap cleanup, and refetch monitoring.
Accept-Encoding (Brotli, Gzip) for AI Crawlers
Specification for serving Brotli, gzip, and zstd to AI crawlers via Accept-Encoding negotiation: which bots support which codecs, fallback rules, and Vary handling.
Accept-Language and AI Language Detection
Specification for Accept-Language negotiation and html lang attribution that lets AI crawlers detect locale correctly without cross-locale citation leaks.