AI Crawler Prefetch Hints Specification
AI crawlers (GPTBot, ClaudeBot, PerplexityBot) do not run browser speculation pipelines, so client-side is mostly invisible to them. The hints they actually consume are server-side: HTTP Link headers, 103 Early Hints responses, sitemaps with priority signals, and llms.txt. This spec defines which hint goes where and how to measure impact.
TL;DR
Resource Hints are a user-agent mechanism. The browser sees and acts. AI crawlers fetch HTML and rarely run a full browser, so most browser-side hints are no-ops for them. What does help: Link HTTP headers exposing related URLs, 103 Early Hints for critical sub-resources, a high-quality sitemap, and a curated llms.txt. Use this spec to split your hint surface into "for browsers" and "for crawlers" without doubling complexity.
Why most browser hints don't help AI crawlers
The W3C Resource Hints specification defines dns-prefetch, preconnect, prefetch, and prerender as relationships of the HTML element that "enable the developer… to assist the user agent in the decision process… to improve page performance" W3C (the spec was discontinued at W3C and rolled into HTML Living Standard at WHATWG GitHub). Every primitive in that spec presumes a user agent that (a) parses HTML, (b) fetches sub-resources speculatively, and (c) navigates between pages. AI crawlers usually only do (a). They fetch the HTML, extract the text, and move on — they don't navigate, so prefetch for the next page is wasted markup.
The Speculation Rules API "allows users to benefit from a performance boost by either prefetching or prerendering future page navigations" Chrome Developers and is even more browser-specific. Treat it as zero help for AI crawlers.
Specification scope
This spec defines hints for two consumers:
- Browser hints — unchanged from web-perf best practice; included for completeness.
- Crawler hints — server-side mechanisms AI crawlers actually consume.
Non-goals: HTTP/2 server push (deprecated in favour of Early Hints), browser preconnect tuning, CDN-level cache behaviour.
1. Browser hints (unchanged)
For human users you keep the standard web-perf surface:
<link rel="preconnect" href="https://cdn.example.com">
<link rel="dns-prefetch" href="https://cdn.example.com">
<link rel="preload" href="/fonts/main.woff2" as="font" type="font/woff2" crossorigin>
<link rel="prefetch" href="/articles/next-likely">Keep these in the
; they cost nothing for crawlers (just bytes) and meaningfully improve LCP/INP for browsers, which feeds Core Web Vitals signals that AI ranking systems consume indirectly.2. Crawler hints
2.1 Link HTTP headers for related URLs
Serve a Link header listing canonical, alternate-language, and related-article URLs:
Link:
Link:
Link:
Crawlers that don't render JS still read response headers. Link headers expose your relationship graph at HTTP-level, which crawlers can index without a render pass.
2.2 103 Early Hints for sub-resource preconnect/preload
RFC 8297 introduced status code 103 "that can be used to convey hints that help a client make preparations for processing the final response" RFC 8297. The status was promoted from Experimental to Proposed Standard in 2025 IETF. Per MDN, 103 "may be sent by a server while it is still preparing a response, with hints about the sites and resources that the server expects the final response will link to" MDN.
For browsers, Early Hints accelerate LCP by letting the client preconnect/preload before the final 200 arrives. For crawlers, Early Hints serve a different purpose: a JS-light crawler that does support 103 (some do, including upstream CDN crawlers) gets your dependency graph cheaply.
Canonical 103 response:
HTTP/1.1 103 Early Hints
Link: ; rel=preload; as=style
Link: ; rel=preload; as=font; crossorigin
Link:
Support landed in NGINX (with explicit feature gating) NGINX blog, Cloudflare, Fastly, and most modern reverse proxies. Note the WHATWG Fetch issue suggesting Early Hints be restricted to HTTP/2+ [WHATWG Fetch #1698] — in practice ship Early Hints only on HTTP/2 or HTTP/3 connections.
2.3 Sitemap with priority + lastmod
The sitemap is the single most reliable AI-crawler discovery hint. Include:
for every canonical URL. matching your content fingerprint (see content-fingerprinting spec). only as relative ranking; AI crawlers use it as a soft signal. - A separate sitemap index for high-priority documents (Tier 1 articles, hub pages).
2.4 llms.txt with priority paths
llms.txt is the AI-native equivalent of robots.txt + sitemap. List the URLs you most want cited and the URLs you do not want indexed. AI crawlers that consume llms.txt (a growing list) treat it as a curated allow-list with semantic context. See How to create llms.txt.
2.5 Link: rel="alternate"; type="application/ld+json"
Expose a structured-data alternate so crawlers that prefer JSON-LD can fetch it directly:
Link: ; rel="alternate"; type="application/ld+json"
This pairs with content negotiation (see AI crawler content negotiation spec).
3. Decision matrix
| Goal | Browser hint | Crawler hint |
|---|---|---|
| Speed up font load | preload | n/a |
| Speed up CSS / JS | preload + Early Hints | Early Hints only |
| Establish CDN connection | preconnect | n/a |
| Reveal canonical URL | Link header | |
| Reveal language alternates | Link header | |
| Reveal next-page navigation | Sitemap pagination + Link rel=next | |
| Reveal high-priority URLs | n/a | sitemap priority + llms.txt |
| Reveal JSON-LD alternate | inline | Link rel=alternate type=application/ld+json |
4. Measurement
To know whether crawler hints help:
- Server log diff. Compare AI-bot fetches in the 7 days before and after enabling Early Hints / Link headers. Expect higher fetch rate per URL and lower time-to-fetch on linked sub-resources.
- Citation tracking. Tools like Profound, ZipTie, and Rankscale report which URLs get cited; measure citation rate change after the change.
- Search Console + Core Web Vitals. Browser-side hint impact still feeds Core Web Vitals; large LCP/INP regressions undo any AI-search benefit.
Common mistakes
- Sending 103 Early Hints over HTTP/1.1. Several clients mishandle informational responses on HTTP/1.1; restrict to HTTP/2+ as the WHATWG Fetch group suggests.
- Treating as a crawler hint. It is browser-only. Use sitemaps and llms.txt to guide crawlers.
- Cross-origin redirect during 103. Browsers discard early hints if the final response cross-origin redirects. Ensure your 103 origin matches the final 200 origin.
- No lastmod in sitemap. Without lastmod, crawlers either over-fetch or under-fetch; both hurt citation freshness.
- Dumping every URL into llms.txt. It's a curated list, not a full sitemap. Quality over completeness.
FAQ
Q: Do GPTBot or ClaudeBot honour ?
Not by default. They typically fetch HTML without running browser speculation. Use sitemaps, Link headers, and llms.txt to surface URLs you want them to fetch.
Q: Is HTTP/2 Server Push still relevant?
No. Chrome removed Server Push in 2022 and the IETF community has moved to Early Hints. Don't rely on Server Push for either browsers or crawlers.
Q: Should I add Link headers and tags for the same relationship?
Yes. Browsers prefer the inline tag; some crawlers prefer headers. Both is cheap.
Q: Can I use 103 Early Hints for content negotiation?
No. 103 is for resource hints only. Use the standard request/response negotiation (see AI crawler content negotiation spec).
Q: How does Speculation Rules API affect AI search?
It's a browser-only navigation accelerator; AI crawlers don't see it. It can indirectly help — faster human navigation improves engagement metrics, which feed into rankings — but it's not a crawler hint.
Related Articles
AI Crawler Content Negotiation Specification
HTTP content negotiation (Accept, Accept-Language, Vary) for AI crawlers — serve LLM-friendly variants without cloaking penalties or cache poisoning.
How to Create llms.txt: Step-by-Step Tutorial for AI Search
Step-by-step tutorial for creating, deploying, and validating an llms.txt file so AI systems and LLMs can discover your site's most important content.
Lazy Loading Patterns for AI Crawlers
Lazy loading patterns that keep AI crawlers (GPTBot, ClaudeBot, PerplexityBot) able to extract citable content while preserving Core Web Vitals performance.