Geodocs.dev

AI Crawler Prefetch Hints Specification

ShareLinkedIn

Open this article in your favorite AI assistant for deeper analysis, summaries, or follow-up questions.

AI crawlers (GPTBot, ClaudeBot, PerplexityBot) do not run browser speculation pipelines, so client-side is mostly invisible to them. The hints they actually consume are server-side: HTTP Link headers, 103 Early Hints responses, sitemaps with priority signals, and llms.txt. This spec defines which hint goes where and how to measure impact.

TL;DR

Resource Hints are a user-agent mechanism. The browser sees and acts. AI crawlers fetch HTML and rarely run a full browser, so most browser-side hints are no-ops for them. What does help: Link HTTP headers exposing related URLs, 103 Early Hints for critical sub-resources, a high-quality sitemap, and a curated llms.txt. Use this spec to split your hint surface into "for browsers" and "for crawlers" without doubling complexity.

Why most browser hints don't help AI crawlers

The W3C Resource Hints specification defines dns-prefetch, preconnect, prefetch, and prerender as relationships of the HTML element that "enable the developer… to assist the user agent in the decision process… to improve page performance" W3C (the spec was discontinued at W3C and rolled into HTML Living Standard at WHATWG GitHub). Every primitive in that spec presumes a user agent that (a) parses HTML, (b) fetches sub-resources speculatively, and (c) navigates between pages. AI crawlers usually only do (a). They fetch the HTML, extract the text, and move on — they don't navigate, so prefetch for the next page is wasted markup.

The Speculation Rules API "allows users to benefit from a performance boost by either prefetching or prerendering future page navigations" Chrome Developers and is even more browser-specific. Treat it as zero help for AI crawlers.

Specification scope

This spec defines hints for two consumers:

  1. Browser hints — unchanged from web-perf best practice; included for completeness.
  2. Crawler hints — server-side mechanisms AI crawlers actually consume.

Non-goals: HTTP/2 server push (deprecated in favour of Early Hints), browser preconnect tuning, CDN-level cache behaviour.

1. Browser hints (unchanged)

For human users you keep the standard web-perf surface:

<link rel="preconnect" href="https://cdn.example.com">
<link rel="dns-prefetch" href="https://cdn.example.com">
<link rel="preload" href="/fonts/main.woff2" as="font" type="font/woff2" crossorigin>
<link rel="prefetch" href="/articles/next-likely">

Keep these in the ; they cost nothing for crawlers (just bytes) and meaningfully improve LCP/INP for browsers, which feeds Core Web Vitals signals that AI ranking systems consume indirectly.

2. Crawler hints

Serve a Link header listing canonical, alternate-language, and related-article URLs:

Link: ; rel="canonical"

Link: ; rel="alternate"; hreflang="de"

Link: ; rel="related"

Crawlers that don't render JS still read response headers. Link headers expose your relationship graph at HTTP-level, which crawlers can index without a render pass.

2.2 103 Early Hints for sub-resource preconnect/preload

RFC 8297 introduced status code 103 "that can be used to convey hints that help a client make preparations for processing the final response" RFC 8297. The status was promoted from Experimental to Proposed Standard in 2025 IETF. Per MDN, 103 "may be sent by a server while it is still preparing a response, with hints about the sites and resources that the server expects the final response will link to" MDN.

For browsers, Early Hints accelerate LCP by letting the client preconnect/preload before the final 200 arrives. For crawlers, Early Hints serve a different purpose: a JS-light crawler that does support 103 (some do, including upstream CDN crawlers) gets your dependency graph cheaply.

Canonical 103 response:

HTTP/1.1 103 Early Hints

Link: ; rel=preload; as=style

Link: ; rel=preload; as=font; crossorigin

Link: ; rel=preconnect

Support landed in NGINX (with explicit feature gating) NGINX blog, Cloudflare, Fastly, and most modern reverse proxies. Note the WHATWG Fetch issue suggesting Early Hints be restricted to HTTP/2+ [WHATWG Fetch #1698] — in practice ship Early Hints only on HTTP/2 or HTTP/3 connections.

2.3 Sitemap with priority + lastmod

The sitemap is the single most reliable AI-crawler discovery hint. Include:

  • for every canonical URL.
  • matching your content fingerprint (see content-fingerprinting spec).
  • only as relative ranking; AI crawlers use it as a soft signal.
  • A separate sitemap index for high-priority documents (Tier 1 articles, hub pages).

2.4 llms.txt with priority paths

llms.txt is the AI-native equivalent of robots.txt + sitemap. List the URLs you most want cited and the URLs you do not want indexed. AI crawlers that consume llms.txt (a growing list) treat it as a curated allow-list with semantic context. See How to create llms.txt.

Expose a structured-data alternate so crawlers that prefer JSON-LD can fetch it directly:

Link: ; rel="alternate"; type="application/ld+json"

This pairs with content negotiation (see AI crawler content negotiation spec).

3. Decision matrix

GoalBrowser hintCrawler hint
Speed up font loadpreloadn/a
Speed up CSS / JSpreload + Early HintsEarly Hints only
Establish CDN connectionpreconnectn/a
Reveal canonical URLLink header
Reveal language alternatesLink header
Reveal next-page navigationSitemap pagination + Link rel=next
Reveal high-priority URLsn/asitemap priority + llms.txt
Reveal JSON-LD alternateinline