Conditional GET and ETag Handling for AI Crawlers
Conditional GET uses ETag (or Last-Modified) on responses and If-None-Match (or If-Modified-Since) on subsequent requests. When the resource has not changed, the server returns 304 Not Modified with no body, saving bandwidth and crawler budget. AI search crawlers honor this mechanism per RFC 9110/9111.
TL;DR
Emit an ETag on every cacheable response. Compare incoming If-None-Match against the current ETag; if they match, return 304 Not Modified with no body. Use strong ETags (byte-identical) for static assets, weak ETags (W/"...") for HTML where minor formatting variations are acceptable. Stable ETags across deploys for unchanged content are essential, otherwise you save no bandwidth.
What conditional GET does
Conditional GET lets a client (browser, CDN, AI crawler) revalidate a previously-cached response. The flow:
flowchart LR
A["Crawler"] -->|"GET /article
If-None-Match: \"abc123\""| B["Server"]
B --> C{"ETag matches?"}
C -->|"Yes"| D["304 Not Modified
(no body)"]
C -->|"No"| E["200 OK
+ new ETag
+ body"]The two header pairs:
- ETag ↔ If-None-Match — strongest mechanism. Server returns ETag on responses; client echoes it in If-None-Match on revalidation.
- Last-Modified ↔ If-Modified-Since — timestamp-based. Less precise than ETag (1-second granularity).
Google's crawling infrastructure officially supports both per the HTTP Caching standard (RFC 9111) (Google Search Central, 2024). GPTBot, ClaudeBot, and PerplexityBot are observed to honor them in practice.
Strong vs weak ETags
| Type | Syntax | Meaning | Use for |
|---|---|---|---|
| Strong | "abc123" | Byte-for-byte identical | Static assets (images, JS, CSS) |
| Weak | W/"abc123" | Semantically equivalent | HTML with minor whitespace/formatting variations |
Strong ETags must change when any byte changes. Weak ETags can stay stable across cosmetic differences (e.g., gzip vs brotli encodings, formatter changes). For HTML, weak ETags are usually correct.
Generation strategies
- Hash of content — SHA-1 or xxHash of the response body. Most accurate. Cost: hashing every response.
- Version + last-modified — e.g., "v3-2026-04-15T10:00:00". Cheap; tied to deploy + content metadata.
- Resource ID + version — e.g., "article-1234-rev-7". Stable across deploys when content unchanged.
- CMS-driven — use the CMS's content hash or revision number directly.
Avoid generation strategies that change with every request (e.g., timestamp at second-precision) — they break revalidation.
Implementation patterns
Static asset (Express)
js
app.use(express.static('public', {
etag: true,
lastModified: true,
maxAge: '1y',
immutable: true,
}));
Dynamic HTML (FastAPI)
from hashlib import sha256
from fastapi import Response, Header@app.get('/article/{slug}')
async def article(slug: str, response: Response, if_none_match: str | None = Header(default=None)):
body = render_article(slug)
etag = f'W/"{sha256(body.encode()).hexdigest()[:16]}"'
if if_none_match == etag:
return Response(status_code=304, headers={'ETag': etag})
response.headers['ETag'] = etag
response.headers['Cache-Control'] = 'public, s-maxage=3600, stale-while-revalidate=86400'
return body
Edge worker (Cloudflare)
js
export default {
async fetch(request) {
const cache = caches.default;
const cached = await cache.match(request);
if (cached && request.headers.get('If-None-Match') === cached.headers.get('ETag')) {
return new Response(null, { status: 304, headers: { ETag: cached.headers.get('ETag') } });
}
return cached || handle(request);
}
};
Quoting and format rules
- Always quote ETags: ETag: "abc123". Unquoted values are non-conformant and rejected by some clients.
- Weak ETags use the W/ prefix outside the quotes: ETag: W/"abc123".
- The client echoes the entire value verbatim in If-None-Match, including the W/ prefix and quotes.
- Multiple ETags can be sent in If-None-Match: comma-separated list.
Cache stability across deploys
The single biggest mistake is generating a new ETag on every deploy even when content has not changed. Patterns that preserve stability:
- Hash on content body, not deploy time.
- Exclude environment-only differences from the hash (build IDs, request IDs).
- For frameworks that emit deploy-tied ETags, override with content-derived hashes.
Conditional GET vs cache-control
- Cache-Control decides how long a cache copy is fresh.
- ETag / Last-Modified decide how to revalidate when freshness expires.
Use both together: aggressive s-maxage for fresh windows, plus ETag for revalidation when the freshness window expires.
When NOT to send ETag
- Personalized responses (user-specific content): ETag stability is impossible and revalidation cannot be safely shared.
- Streaming responses: chunked encoding makes ETag generation expensive and often pointless.
- Resources that genuinely change every request (real-time data feeds).
Common mistakes
- ETag changes on every request (e.g., timestamps in the hash).
- Forgetting to quote the ETag value.
- Returning 304 with a body (against spec; some clients ignore the body, others fail).
- Including session cookies or headers in the hash.
- Using strong ETag for content that legitimately differs by encoding (gzip/brotli) — use weak ETag.
FAQ
Q: Do GPTBot and ClaudeBot honor If-None-Match?
Vendors do not publish formal documentation, but server logs widely show 304 responses being respected by AI crawlers — they reduce subsequent fetches of unchanged URLs. Treat this like any RFC-compliant client.
Q: Strong or weak ETag for HTML?
Weak (W/"..."). HTML can vary slightly across encodings (whitespace, gzip variants) without semantic change. Strong ETag would over-invalidate.
Q: Can I use both ETag and Last-Modified?
Yes. RFC permits both; AI crawlers prefer ETag when present. Last-Modified is a fallback for clients that ignore ETag.
Q: Does 304 count toward crawl budget?
304 is a much cheaper response (no body) and reduces bandwidth, but it still counts as a request. The benefit is in throughput — the same crawl budget covers more URLs.
Q: How does ETag interact with edge cache?
The edge stores both the response and its ETag. On revalidation, the edge can return 304 from cache without hitting origin if the requester's If-None-Match matches the cached ETag.
Q: What if my ETag generation is expensive?
Use a cached hash keyed on the underlying source (CMS revision ID, file mtime). You should never recompute a SHA over the body on every request unless responses are small.
Related Articles
Edge Rendering Strategy for AI Citation Optimization
Edge rendering strategy for AI citation: Cloudflare Workers vs Vercel Edge vs Netlify Edge, latency targets, cache-key strategy, and content parity rules.
HTTP Status Code Reference for AI Crawlers
HTTP status code reference for AI crawlers: how 2xx, 3xx, 4xx, 5xx codes affect GPTBot, ClaudeBot, PerplexityBot, and Googlebot indexing.
JavaScript SPA Hydration Patterns for AI Crawlers
JavaScript SPA hydration patterns for AI crawlers: rendering modes, mismatch fixes, and framework-specific strategies for GPTBot, ClaudeBot, PerplexityBot.