Geodocs.dev

Accept-Encoding (Brotli, Gzip) for AI Crawlers

ShareLinkedIn

Open this article in your favorite AI assistant for deeper analysis, summaries, or follow-up questions.

AI crawlers negotiate compression with the standard Accept-Encoding header defined in RFC 9110. Most send gzip, a growing subset accept br (Brotli), and zstd support is rare — so a safe origin policy is br > gzip > identity with Vary: Accept-Encoding on cacheable responses.

TL;DR

Serve Brotli when an AI crawler advertises br, gzip when it advertises gzip, and uncompressed (identity) when no Accept-Encoding header is sent. Always emit Vary: Accept-Encoding on cached responses so CDN edges do not return the wrong representation to a bot.

Scope

This specification applies to:

  • HTML and JSON responses served to AI crawler user agents (GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Claude-User, PerplexityBot, Googlebot, Google-Extended, Applebot-Extended, CCBot, Bytespider, and similar).
  • Static and dynamically generated documents under 10 MB; binary assets larger than 10 MB should follow the asset-delivery guidance, not this spec.
  • Responses cached at a CDN edge (Cloudflare, Fastly, Akamai, CloudFront, Vercel) where Vary correctness matters.

Normative requirements

  1. The origin MUST parse Accept-Encoding per RFC 9110 §12.5.3, including q-values.
  2. The origin MUST select the highest-q acceptable encoding from br, gzip, deflate, zstd, identity.
  3. The origin MUST emit Content-Encoding matching the selected encoding when compressing.
  4. The origin MUST emit Vary: Accept-Encoding on any response cached by intermediate caches.
  5. The origin MUST NOT compress when the request omits Accept-Encoding (treat as identity only).
  6. The origin SHOULD fall back to gzip if Brotli is unavailable for the requested resource (precompressed asset missing).
  7. The origin SHOULD NOT apply Brotli at quality > 6 to dynamically generated bodies; the CPU cost outweighs the bandwidth saving for one-shot bot responses.

Encoding catalog

EncodingRFCTypical compression ratio (HTML)CPU costCrawler support
identityRFC 91101.00xNoneUniversal
gzipRFC 19523-5xLowUniversal across major AI bots
br (Brotli)RFC 79324-6xMedium (high at q=11)Common in Chromium-based agents (Googlebot, browser agents); variable in fetch-only bots
deflateRFC 19513-4xLowLegacy; not recommended (interop bugs)
zstdRFC 84784-6xLowRare; mostly modern browsers, not bots

Crawler support matrix

User agentSends Accept-Encoding?Advertised codecs (typical)
GooglebotYesgzip, deflate, br
GPTBotYesgzip (Brotli reported by some operators)
OAI-SearchBotYesgzip
ChatGPT-UserYesgzip, deflate, br (Chromium-derived)
ClaudeBot / anthropic-aiYesgzip
Claude-UserYesgzip, br
PerplexityBotYesgzip
Perplexity-UserYesgzip, br
BytespiderInconsistentgzip when sent
CCBotYesgzip

This matrix reflects practitioner observation as of 2026; vendors do not publish formal codec commitments. Treat absent Accept-Encoding as a signal to serve identity-encoded bytes (per requirement 5).

Vary handling

A cached response that varies by encoding MUST include Vary: Accept-Encoding. Without it, a CDN edge can return a Brotli body to a bot that advertised only gzip, which fails to decode and produces a parse error on the bot side.

HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Content-Encoding: br
Vary: Accept-Encoding
Cache-Control: public, max-age=300

When a bot omits Accept-Encoding, return identity:

HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Vary: Accept-Encoding
Cache-Control: public, max-age=300

Note: a security middleware that strips Accept-Encoding from bot traffic (e.g., F5 ASM Bot Defense — K70083507) will cause the origin to serve identity, which is correct but bandwidth-wasteful. Audit edge rules before assuming low compression rates indicate bot misbehavior.

Edge precompression pattern

For static HTML, precompress at build time and serve the precompressed file matching the request's Accept-Encoding:

/dist/page.html             (identity)
/dist/page.html.gz          (gzip)
/dist/page.html.br          (Brotli, quality 11)

Server pseudo-config (NGINX-style):

location / {
    gzip_static  on;
    brotli_static on;
    add_header   Vary Accept-Encoding;
}

This pattern keeps the CPU cost of Brotli at quality 11 amortized across all crawls and is the recommended baseline for sites that publish large libraries of canonical content.

Dynamic response pattern

For dynamically rendered HTML (SSR, edge functions), use Brotli quality 4-6 or gzip level 6. Higher Brotli qualities consume too much CPU per request and risk timing out bot fetches.

Failure modes

SymptomLikely causeRemediation
Bot logs show parse error / empty bodyContent-Encoding mismatch or missing VaryVerify negotiation; add Vary: Accept-Encoding
CDN serves Brotli to a gzip-only botMissing or wrong VaryAudit cache key; include encoding in cache key
Origin CPU spikes during bot wavesBrotli q=11 on dynamic responsesDrop dynamic Brotli to q=4-6 or precompress
Bot omits Accept-EncodingMiddleware strip or older bot clientServe identity (per requirement 5)
Compression slows time-to-first-byteStreaming compression with high qUse chunked transfer with q=4-6 for SSR

Validation checklist

  • [ ] Request with Accept-Encoding: br, gzip returns Content-Encoding: br and a Vary header.
  • [ ] Request with Accept-Encoding: gzip returns Content-Encoding: gzip.
  • [ ] Request with no Accept-Encoding header returns identity bytes (no Content-Encoding).
  • [ ] Request with Accept-Encoding: identity;q=0 returns 406 Not Acceptable only if no acceptable encoding exists; otherwise honor the q-value list.
  • [ ] CDN cache key includes encoding (or response is uncacheable when varying).
  • [ ] Brotli quality is ≤ 6 for dynamic responses.

FAQ

Q: Should I always serve Brotli when a bot accepts it?

Yes for static, precompressed content. For dynamically generated responses, prefer Brotli quality 4-6 or gzip; quality 11 Brotli on dynamic SSR consumes more CPU than the bandwidth saving recovers.

Q: What happens if I send Brotli to a bot that does not accept it?

The bot fails to decode the body and either retries with Accept-Encoding: identity, treats the body as empty, or marks the URL as broken. All three outcomes hurt citation eligibility. Always honor the negotiated encoding.

Q: Do I need Vary: Accept-Encoding if I do not cache?

No. Vary only affects shared caches. If the response is Cache-Control: private or no-store, the header has no effect. Add it anyway as a defensive default — CDNs in front of your origin may still cache.

Q: Why do some bots omit Accept-Encoding?

Reasons include older bot clients, stripped headers from intermediate proxies, or security middleware. The correct origin response is identity bytes; do not attempt to compress speculatively when the header is absent.

Q: Is zstd worth supporting for AI crawlers?

Not yet. As of 2026, only modern browsers and a small number of internal Google fetchers advertise zstd. The codec offers Brotli-class ratios with lower CPU, but bot adoption is too thin to justify operational complexity.

Related Articles

specification

AI Crawler Content Negotiation Specification

HTTP content negotiation (Accept, Accept-Language, Vary) for AI crawlers — serve LLM-friendly variants without cloaking penalties or cache poisoning.

reference

DNS Prefetch and Preconnect for AI Crawlers

Reference for using dns-prefetch and preconnect resource hints with AI crawlers and browser agents: semantics, ordering, and impact on render-stage crawls.

reference

HTTP Cache Headers for AI Crawlers

Reference for HTTP cache headers (ETag, Cache-Control, Last-Modified, Vary) and how AI crawlers use them for citation freshness.

Topics
Stay Updated

GEO & AI Search Insights

New articles, framework updates, and industry analysis. No spam, unsubscribe anytime.