Accept-Encoding (Brotli, Gzip) for AI Crawlers
AI crawlers negotiate compression with the standard Accept-Encoding header defined in RFC 9110. Most send gzip, a growing subset accept br (Brotli), and zstd support is rare — so a safe origin policy is br > gzip > identity with Vary: Accept-Encoding on cacheable responses.
TL;DR
Serve Brotli when an AI crawler advertises br, gzip when it advertises gzip, and uncompressed (identity) when no Accept-Encoding header is sent. Always emit Vary: Accept-Encoding on cached responses so CDN edges do not return the wrong representation to a bot.
Scope
This specification applies to:
- HTML and JSON responses served to AI crawler user agents (GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Claude-User, PerplexityBot, Googlebot, Google-Extended, Applebot-Extended, CCBot, Bytespider, and similar).
- Static and dynamically generated documents under 10 MB; binary assets larger than 10 MB should follow the asset-delivery guidance, not this spec.
- Responses cached at a CDN edge (Cloudflare, Fastly, Akamai, CloudFront, Vercel) where Vary correctness matters.
Normative requirements
- The origin MUST parse Accept-Encoding per RFC 9110 §12.5.3, including q-values.
- The origin MUST select the highest-q acceptable encoding from br, gzip, deflate, zstd, identity.
- The origin MUST emit Content-Encoding matching the selected encoding when compressing.
- The origin MUST emit Vary: Accept-Encoding on any response cached by intermediate caches.
- The origin MUST NOT compress when the request omits Accept-Encoding (treat as identity only).
- The origin SHOULD fall back to gzip if Brotli is unavailable for the requested resource (precompressed asset missing).
- The origin SHOULD NOT apply Brotli at quality > 6 to dynamically generated bodies; the CPU cost outweighs the bandwidth saving for one-shot bot responses.
Encoding catalog
| Encoding | RFC | Typical compression ratio (HTML) | CPU cost | Crawler support |
|---|---|---|---|---|
| identity | RFC 9110 | 1.00x | None | Universal |
| gzip | RFC 1952 | 3-5x | Low | Universal across major AI bots |
| br (Brotli) | RFC 7932 | 4-6x | Medium (high at q=11) | Common in Chromium-based agents (Googlebot, browser agents); variable in fetch-only bots |
| deflate | RFC 1951 | 3-4x | Low | Legacy; not recommended (interop bugs) |
| zstd | RFC 8478 | 4-6x | Low | Rare; mostly modern browsers, not bots |
Crawler support matrix
| User agent | Sends Accept-Encoding? | Advertised codecs (typical) |
|---|---|---|
| Googlebot | Yes | gzip, deflate, br |
| GPTBot | Yes | gzip (Brotli reported by some operators) |
| OAI-SearchBot | Yes | gzip |
| ChatGPT-User | Yes | gzip, deflate, br (Chromium-derived) |
| ClaudeBot / anthropic-ai | Yes | gzip |
| Claude-User | Yes | gzip, br |
| PerplexityBot | Yes | gzip |
| Perplexity-User | Yes | gzip, br |
| Bytespider | Inconsistent | gzip when sent |
| CCBot | Yes | gzip |
This matrix reflects practitioner observation as of 2026; vendors do not publish formal codec commitments. Treat absent Accept-Encoding as a signal to serve identity-encoded bytes (per requirement 5).
Vary handling
A cached response that varies by encoding MUST include Vary: Accept-Encoding. Without it, a CDN edge can return a Brotli body to a bot that advertised only gzip, which fails to decode and produces a parse error on the bot side.
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Content-Encoding: br
Vary: Accept-Encoding
Cache-Control: public, max-age=300When a bot omits Accept-Encoding, return identity:
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Vary: Accept-Encoding
Cache-Control: public, max-age=300Note: a security middleware that strips Accept-Encoding from bot traffic (e.g., F5 ASM Bot Defense — K70083507) will cause the origin to serve identity, which is correct but bandwidth-wasteful. Audit edge rules before assuming low compression rates indicate bot misbehavior.
Edge precompression pattern
For static HTML, precompress at build time and serve the precompressed file matching the request's Accept-Encoding:
/dist/page.html (identity)
/dist/page.html.gz (gzip)
/dist/page.html.br (Brotli, quality 11)Server pseudo-config (NGINX-style):
location / {
gzip_static on;
brotli_static on;
add_header Vary Accept-Encoding;
}This pattern keeps the CPU cost of Brotli at quality 11 amortized across all crawls and is the recommended baseline for sites that publish large libraries of canonical content.
Dynamic response pattern
For dynamically rendered HTML (SSR, edge functions), use Brotli quality 4-6 or gzip level 6. Higher Brotli qualities consume too much CPU per request and risk timing out bot fetches.
Failure modes
| Symptom | Likely cause | Remediation |
|---|---|---|
| Bot logs show parse error / empty body | Content-Encoding mismatch or missing Vary | Verify negotiation; add Vary: Accept-Encoding |
| CDN serves Brotli to a gzip-only bot | Missing or wrong Vary | Audit cache key; include encoding in cache key |
| Origin CPU spikes during bot waves | Brotli q=11 on dynamic responses | Drop dynamic Brotli to q=4-6 or precompress |
| Bot omits Accept-Encoding | Middleware strip or older bot client | Serve identity (per requirement 5) |
| Compression slows time-to-first-byte | Streaming compression with high q | Use chunked transfer with q=4-6 for SSR |
Validation checklist
- [ ] Request with Accept-Encoding: br, gzip returns Content-Encoding: br and a Vary header.
- [ ] Request with Accept-Encoding: gzip returns Content-Encoding: gzip.
- [ ] Request with no Accept-Encoding header returns identity bytes (no Content-Encoding).
- [ ] Request with Accept-Encoding: identity;q=0 returns 406 Not Acceptable only if no acceptable encoding exists; otherwise honor the q-value list.
- [ ] CDN cache key includes encoding (or response is uncacheable when varying).
- [ ] Brotli quality is ≤ 6 for dynamic responses.
FAQ
Q: Should I always serve Brotli when a bot accepts it?
Yes for static, precompressed content. For dynamically generated responses, prefer Brotli quality 4-6 or gzip; quality 11 Brotli on dynamic SSR consumes more CPU than the bandwidth saving recovers.
Q: What happens if I send Brotli to a bot that does not accept it?
The bot fails to decode the body and either retries with Accept-Encoding: identity, treats the body as empty, or marks the URL as broken. All three outcomes hurt citation eligibility. Always honor the negotiated encoding.
Q: Do I need Vary: Accept-Encoding if I do not cache?
No. Vary only affects shared caches. If the response is Cache-Control: private or no-store, the header has no effect. Add it anyway as a defensive default — CDNs in front of your origin may still cache.
Q: Why do some bots omit Accept-Encoding?
Reasons include older bot clients, stripped headers from intermediate proxies, or security middleware. The correct origin response is identity bytes; do not attempt to compress speculatively when the header is absent.
Q: Is zstd worth supporting for AI crawlers?
Not yet. As of 2026, only modern browsers and a small number of internal Google fetchers advertise zstd. The codec offers Brotli-class ratios with lower CPU, but bot adoption is too thin to justify operational complexity.
Related Articles
AI Crawler Content Negotiation Specification
HTTP content negotiation (Accept, Accept-Language, Vary) for AI crawlers — serve LLM-friendly variants without cloaking penalties or cache poisoning.
DNS Prefetch and Preconnect for AI Crawlers
Reference for using dns-prefetch and preconnect resource hints with AI crawlers and browser agents: semantics, ordering, and impact on render-stage crawls.
HTTP Cache Headers for AI Crawlers
Reference for HTTP cache headers (ETag, Cache-Control, Last-Modified, Vary) and how AI crawlers use them for citation freshness.