Geodocs.dev

HTTP 103 Early Hints for AI Crawlers Specification

ShareLinkedIn

Open this article in your favorite AI assistant for deeper analysis, summaries, or follow-up questions.

HTTP 103 Early Hints (RFC 8297) is an interim HTTP response that lets origin servers send Link preload/preconnect hints before the final 200, accelerating AI crawler page load on hydrated SPA pages and improving crawl-budget utilization across Googlebot, GPTBot, ClaudeBot, and PerplexityBot.

TL;DR

  • HTTP 103 Early Hints, defined in RFC 8297, is a 1xx interim status that ships Link preload/preconnect headers before the origin sends the final response.
  • Parallelizing preconnect/preload work with origin compute typically reduces perceived TTFB on hydrated SPA pages, with the gain proportional to how many third-party origins the crawler has not already warmed.
  • Cloudflare, Fastly, Vercel, Akamai, and nginx 1.25+ now expose 103 emission natively; CDNs typically buffer-and-replay hints, while origins emit them before flush.
  • Crawlers without HTTP/2 1xx support fall through to the eventual 200; RFC 8297 mandates ignore-then-200 semantics, so 103 is safe to ship to mixed crawler fleets.

Definition

HTTP 103 Early Hints, defined in RFC 8297 (IETF, 2017), is an informational 1xx HTTP status that allows an origin server to emit one or more Link headers — typically rel=preload or rel=preconnect — before the final 200, 30x, or 40x response is ready. Because Early Hints are interim responses, the connection stays open and a subsequent final response must still follow. Unlike HTTP/2 Server Push, which streams resource bytes proactively, 103 only hints — the receiver decides whether to act on each Link. For AI crawlers fetching modern JS-rendered pages, 103 lets the crawler open TLS sockets to image CDNs, schema endpoints, and font origins while the application server is still computing the HTML body. Multiple 103s are permitted in a single exchange, but RFC 8297 §2 cautions that intermediaries must forward Early Hints transparently or omit them entirely. Implementations now ship in nginx 1.25 (October 2023) and across Cloudflare, Fastly, Vercel Edge Config, and Akamai.

Why this matters

AI crawlers — including Googlebot, GPTBot, PerplexityBot, ClaudeBot, and Bingbot — operate under per-domain crawl budgets. On hydrated SPA or partial-prerender pages, time-to-first-byte (TTFB) is dominated by server-side data fetching and stream rendering, leaving the crawler idle on the TCP/TLS layer for a noticeable share of the per-page budget. Early Hints closes that gap by giving crawlers a head start on parallel preconnects to:

  • Image CDNs referenced in og:image and Article schema.
  • API origins serving JSON-LD or Article body chunks.
  • Font and CSS origins required to render text content correctly during snapshot capture.

When perceived TTFB drops, more of the per-domain crawl budget is spent on actual content fetch versus connection setup, increasing the likelihood that a content URL is fully snapshotted within the per-page timeout. For sites with thousands of long-tail URLs, that compounding effect can determine whether new content reaches AI search indices in the same crawl cycle as a Googlebot pass. Early Hints is also one of the few primitives that survives the deprecation of HTTP/2 Server Push in Chromium, making it the canonical mechanism for crawler latency optimization in 2026.

How it works

The handshake is simple but precise: the origin emits HTTP/2 103 Early Hints followed by one or more Link: headers, then later flushes the eventual final response over the same stream. Crawlers that support 1xx interim responses parse the Link headers immediately, dispatch preconnects, and resume waiting for the 200. Non-supporting crawlers ignore the 103 and process only the eventual 200 — RFC 8297 §3 requires this fallback for compatibility.

sequenceDiagram
  participant Crawler as AI Crawler
  participant Edge as Edge or CDN
  participant Origin as Origin App
  Crawler->>Edge: GET /article HTTP/2
  Edge->>Origin: GET /article
  Origin-->>Edge: 103 Early Hints + Link preload preconnect
  Edge-->>Crawler: 103 Early Hints + Link headers
  Origin-->>Edge: 200 OK + HTML body
  Edge-->>Crawler: 200 OK + HTML body

A few subtleties matter at the edge layer. CDNs that buffer responses must forward 103 before buffering the body — otherwise the latency benefit is erased. Cloudflare's Early Hints documentation explains that Cloudflare auto-emits 103 from the Link headers in the final 200 (a cache-and-replay optimization) so that subsequent visitors get hints even without origin work. Fastly Compute@Edge and Vercel Edge Config both expose explicit 103 emission APIs, and nginx 1.25 added the early_hints on; directive controlling the same behavior natively.

Practical application

Production deployments fall into three patterns: edge-replay (Cloudflare-style automatic 103), explicit emit (Fastly VCL, Vercel Edge Functions), and origin-native (nginx 1.25+, application frameworks). Each requires different config:

# nginx 1.25+
location / {
  early_hints on;
  add_header Link "</fonts/inter.woff2>; rel=preload; as=font; crossorigin" early;
  add_header Link "<https://images.example.com>; rel=preconnect" early;
}

js

// Vercel Edge Function (illustrative)

export const config = { runtime: 'edge' };

export default async function handler(request) {

// Vercel auto-promotes 103 emission via experimental.earlyHints flag

const final = await fetch(originURL);

final.headers.set(

'Link',

'; rel=preload; as=font; crossorigin, ; rel=preconnect'

);

return final;

}

Test the deployment with curl --http2 -v https://example.com and look for the < HTTP/2 103 interim line. For Cloudflare, check the cf-cache-status and cf-early-hints response headers to confirm the hint was replayed from cache. Once verified, instrument the AI-crawler logs (parse the User-Agent for GPTBot, ClaudeBot, PerplexityBot) and chart per-bot TTFB before and after enabling 103. Practitioners report the largest gain on bot-heavy long-tail content where the page references third-party origins the crawler has not connected to recently.

Common mistakes

  • Emitting 103 over HTTP/1.1: RFC 8297 §2 requires HTTP/2 or HTTP/3 because HTTP/1.1's pipelining model cannot reliably interleave interim and final responses. Most CDNs silently drop 103 over HTTP/1.1.
  • Buffering at the edge: any reverse proxy that holds the full origin response before flushing will collapse the latency benefit. Audit proxy_buffering (nginx) and equivalents on each layer.
  • Hinting resources the crawler cannot fetch: paywalled CDN URLs, region-locked image origins, or auth-protected APIs are wasted hints and inflate connection counts on the crawler side.
  • Forgetting crossorigin on font preloads: Chromium-based crawler engines silently ignore preloads without crossorigin, leaving the optimization inert for those bots.

FAQ

Q: Which AI crawlers support HTTP 103 Early Hints?

Major Chromium-derived crawlers — Googlebot, Bingbot, and AI fetchers running on headless Chrome — accept 103 because the underlying engine implemented it in 2022 (Chrome Platform Status). GPTBot, ClaudeBot, and PerplexityBot do not publish parser internals, but practitioner reports and Cloudflare's Early Hints docs describe successful preconnect dispatch for those user-agents on real traffic. RFC 8297 mandates that any HTTP/2-compliant client either honor or transparently ignore 103, so emission is safe regardless.

Q: How does HTTP 103 differ from HTTP/2 Server Push?

Server Push (HTTP/2 PUSH_PROMISE) actively streams resource bytes from origin to client without a request, while 103 only hints at URLs to fetch. Server Push was deprecated and removed from Chromium because the resources were often pushed redundantly, wasting bandwidth (Chrome Platform Status). 103 sidesteps that by leaving the fetch decision to the client, which respects existing cache state. RFC 8297 was framed as the durable successor to Server Push for the preconnect/preload use case.

Q: Can HTTP 103 Early Hints break older crawlers?

No. RFC 8297 §3 mandates that any HTTP-conformant client treat unknown 1xx responses as informational and continue waiting for the final response. Crawlers from before 2017 that do not recognize 103 will see only the eventual 200 and process the page identically to today. There is no documented case of emitting 103 causing downstream parser failures across major crawler fleets.

Related Articles

guide

404 Page AI Crawler Handling: Avoiding Citation Loss During Migrations

Migration playbook for keeping AI citations during URL changes — hard 404 vs soft 404, 410 Gone, redirect chains, sitemap cleanup, and refetch monitoring.

specification

Accept-Encoding (Brotli, Gzip) for AI Crawlers

Specification for serving Brotli, gzip, and zstd to AI crawlers via Accept-Encoding negotiation: which bots support which codecs, fallback rules, and Vary handling.

specification

Accept-Language and AI Language Detection

Specification for Accept-Language negotiation and html lang attribution that lets AI crawlers detect locale correctly without cross-locale citation leaks.

Stay Updated

GEO & AI Search Insights

New articles, framework updates, and industry analysis. No spam, unsubscribe anytime.