Accept-Encoding (Brotli, Gzip) for AI Crawlers

AI crawlers negotiate compression with the standard Accept-Encoding header defined in RFC 9110. Most send gzip, a growing subset accept br (Brotli), and zstd support is rare — so a safe origin policy is br > gzip > identity with Vary: Accept-Encoding on cacheable responses.

TL;DR

Serve Brotli when an AI crawler advertises br, gzip when it advertises gzip, and uncompressed (identity) when no Accept-Encoding header is sent. Always emit Vary: Accept-Encoding on cached responses so CDN edges do not return the wrong representation to a bot.

Scope

This specification applies to:

HTML and JSON responses served to AI crawler user agents (GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Claude-User, PerplexityBot, Googlebot, Google-Extended, Applebot-Extended, CCBot, Bytespider, and similar).
Static and dynamically generated documents under 10 MB; binary assets larger than 10 MB should follow the asset-delivery guidance, not this spec.
Responses cached at a CDN edge (Cloudflare, Fastly, Akamai, CloudFront, Vercel) where Vary correctness matters.

Normative requirements

The origin MUST parse Accept-Encoding per RFC 9110 §12.5.3, including q-values.
The origin MUST select the highest-q acceptable encoding from br, gzip, deflate, zstd, identity.
The origin MUST emit Content-Encoding matching the selected encoding when compressing.
The origin MUST emit Vary: Accept-Encoding on any response cached by intermediate caches.
The origin MUST NOT compress when the request omits Accept-Encoding (treat as identity only).
The origin SHOULD fall back to gzip if Brotli is unavailable for the requested resource (precompressed asset missing).
The origin SHOULD NOT apply Brotli at quality > 6 to dynamically generated bodies; the CPU cost outweighs the bandwidth saving for one-shot bot responses.

Encoding catalog

Encoding	RFC	Typical compression ratio (HTML)	CPU cost	Crawler support
identity	RFC 9110	1.00x	None	Universal
gzip	RFC 1952	3-5x	Low	Universal across major AI bots
br (Brotli)	RFC 7932	4-6x	Medium (high at q=11)	Common in Chromium-based agents (Googlebot, browser agents); variable in fetch-only bots
deflate	RFC 1951	3-4x	Low	Legacy; not recommended (interop bugs)
zstd	RFC 8478	4-6x	Low	Rare; mostly modern browsers, not bots

Crawler support matrix

User agent	Sends Accept-Encoding?	Advertised codecs (typical)
Googlebot	Yes	gzip, deflate, br
GPTBot	Yes	gzip (Brotli reported by some operators)
OAI-SearchBot	Yes	gzip
ChatGPT-User	Yes	gzip, deflate, br (Chromium-derived)
ClaudeBot / anthropic-ai	Yes	gzip
Claude-User	Yes	gzip, br
PerplexityBot	Yes	gzip
Perplexity-User	Yes	gzip, br
Bytespider	Inconsistent	gzip when sent
CCBot	Yes	gzip

This matrix reflects practitioner observation as of 2026; vendors do not publish formal codec commitments. Treat absent Accept-Encoding as a signal to serve identity-encoded bytes (per requirement 5).

Vary handling

A cached response that varies by encoding MUST include Vary: Accept-Encoding. Without it, a CDN edge can return a Brotli body to a bot that advertised only gzip, which fails to decode and produces a parse error on the bot side.

HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Content-Encoding: br
Vary: Accept-Encoding
Cache-Control: public, max-age=300

When a bot omits Accept-Encoding, return identity:

HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Vary: Accept-Encoding
Cache-Control: public, max-age=300

Note: a security middleware that strips Accept-Encoding from bot traffic (e.g., F5 ASM Bot Defense — K70083507) will cause the origin to serve identity, which is correct but bandwidth-wasteful. Audit edge rules before assuming low compression rates indicate bot misbehavior.

Edge precompression pattern

For static HTML, precompress at build time and serve the precompressed file matching the request's Accept-Encoding:

/dist/page.html             (identity)
/dist/page.html.gz          (gzip)
/dist/page.html.br          (Brotli, quality 11)

Server pseudo-config (NGINX-style):

location / {
    gzip_static  on;
    brotli_static on;
    add_header   Vary Accept-Encoding;
}

This pattern keeps the CPU cost of Brotli at quality 11 amortized across all crawls and is the recommended baseline for sites that publish large libraries of canonical content.

Dynamic response pattern

For dynamically rendered HTML (SSR, edge functions), use Brotli quality 4-6 or gzip level 6. Higher Brotli qualities consume too much CPU per request and risk timing out bot fetches.

Failure modes

Symptom	Likely cause	Remediation
Bot logs show parse error / empty body	Content-Encoding mismatch or missing Vary	Verify negotiation; add Vary: Accept-Encoding
CDN serves Brotli to a gzip-only bot	Missing or wrong Vary	Audit cache key; include encoding in cache key
Origin CPU spikes during bot waves	Brotli q=11 on dynamic responses	Drop dynamic Brotli to q=4-6 or precompress
Bot omits Accept-Encoding	Middleware strip or older bot client	Serve identity (per requirement 5)
Compression slows time-to-first-byte	Streaming compression with high q	Use chunked transfer with q=4-6 for SSR

Validation checklist

[ ] Request with Accept-Encoding: br, gzip returns Content-Encoding: br and a Vary header.
[ ] Request with Accept-Encoding: gzip returns Content-Encoding: gzip.
[ ] Request with no Accept-Encoding header returns identity bytes (no Content-Encoding).
[ ] Request with Accept-Encoding: identity;q=0 returns 406 Not Acceptable only if no acceptable encoding exists; otherwise honor the q-value list.
[ ] CDN cache key includes encoding (or response is uncacheable when varying).
[ ] Brotli quality is ≤ 6 for dynamic responses.

FAQ

Q: Should I always serve Brotli when a bot accepts it?

Yes for static, precompressed content. For dynamically generated responses, prefer Brotli quality 4-6 or gzip; quality 11 Brotli on dynamic SSR consumes more CPU than the bandwidth saving recovers.

Q: What happens if I send Brotli to a bot that does not accept it?

The bot fails to decode the body and either retries with Accept-Encoding: identity, treats the body as empty, or marks the URL as broken. All three outcomes hurt citation eligibility. Always honor the negotiated encoding.

Q: Do I need Vary: Accept-Encoding if I do not cache?

No. Vary only affects shared caches. If the response is Cache-Control: private or no-store, the header has no effect. Add it anyway as a defensive default — CDNs in front of your origin may still cache.

Q: Why do some bots omit Accept-Encoding?

Reasons include older bot clients, stripped headers from intermediate proxies, or security middleware. The correct origin response is identity bytes; do not attempt to compress speculatively when the header is absent.

Q: Is zstd worth supporting for AI crawlers?

Not yet. As of 2026, only modern browsers and a small number of internal Google fetchers advertise zstd. The codec offers Brotli-class ratios with lower CPU, but bot adoption is too thin to justify operational complexity.