HTTP Status Code Reference for AI Crawlers
AI search crawlers (Googlebot, GPTBot, ClaudeBot, PerplexityBot) honor standard HTTP semantics. Use 301/308 for permanent moves, 302/307 for temporary, 410 for intentional removals, and 429/503 with Retry-After for graceful throttling without losing index inclusion.
TL;DR
Return the most specific status code for each situation. Permanent move → 301 or 308. Temporary → 302 or 307. Permanently gone → 410. Soft 404s (200 with empty content) confuse every crawler and should be eliminated. Use 429 + Retry-After for rate limiting, 503 + Retry-After for maintenance. Avoid generic 500s and 200-with-error-body patterns.
Status code reference matrix
| Code | Meaning | AI-crawler effect | Use when |
|---|---|---|---|
| 200 OK | Success | Indexed | Page exists with content |
| 201 Created | Created | Not generally crawled | API responses |
| 204 No Content | Empty | Treated like 200 with no body | Avoid for content URLs |
| 301 Moved Permanently | Permanent move | Canonical replaced; signals consolidate | Permanent URL change |
| 302 Found | Temporary | Original URL kept indexed; no consolidation | Temporary detour |
| 303 See Other | Redirect after POST | Treated as 302 by crawlers | API patterns |
| 304 Not Modified | Conditional cache hit | Crawler reuses cached copy | Conditional GET (see ETag) |
| 307 Temporary Redirect | Temporary; method preserved | Treated as 302 for indexing | HTTP → HTTPS redirects, etc. |
| 308 Permanent Redirect | Permanent; method preserved | Treated as 301 for indexing | Permanent move (POST safe) |
| 400 Bad Request | Client error | Page dropped if persistent | Truly bad input |
| 401 Unauthorized | Auth required | Page treated as inaccessible | Truly auth-only content |
| 403 Forbidden | Forbidden | Persistent 403 leads to drop | Block path you do not want indexed |
| 404 Not Found | Missing | Slow decay; rechecked occasionally | Page does not exist |
| 410 Gone | Intentionally removed | Faster removal than 404 | Permanently retired URLs |
| 429 Too Many Requests | Rate limited | Crawler backs off; honors Retry-After | Protect origin |
| 451 Unavailable For Legal Reasons | Legal block | Indexed status frozen | Compliance removals |
| 500 Internal Server Error | Server bug | Crawler retries; persistent 500 hurts | Avoid — fix upstream |
| 502 Bad Gateway | Upstream failed | Crawler retries; persistent harms ranking | Transient gateway issue |
| 503 Service Unavailable | Maintenance/overload | Honors Retry-After; safe for short windows | Planned maintenance |
| 504 Gateway Timeout | Upstream timeout | Crawler retries | Upstream timeout |
Redirect rules
- Use 301 or 308 when a URL change is permanent. Both signal canonical replacement to Google and AI crawlers (Google Search Central).
- Use 302 or 307 for temporary detours. The original URL stays indexed; PageRank-style signals do not consolidate.
- Avoid chaining redirects more than once; AI crawlers may abandon long chains.
- Avoid mixed schemes inside a chain (HTTP to HTTPS to canonical) — collapse to a single hop.
- JavaScript and meta-refresh redirects work for Googlebot but are unreliable for non-Google AI crawlers. Prefer server-side 301/308.
Removal rules: 404 vs 410
- 404 Not Found — the URL is missing but might come back. Crawlers re-check on a slow cadence.
- 410 Gone — the URL is intentionally and permanently retired. Removal from the index is faster than 404.
Use 410 when you are sure the URL will not return (deleted product, sunset article). Reserve 404 for genuinely-missing-or-might-return cases.
Throttling: 429 and 503
Both signal "come back later." Always pair with the Retry-After header.
- 429 Too Many Requests — the requester (specific IP / user-agent) exceeded a rate limit. AI crawlers respect this and reduce concurrency.
- 503 Service Unavailable — the service itself is down or overloaded. Use for short, planned maintenance windows.
HTTP/1.1 503 Service Unavailable
Retry-After: 600
Content-Type: text/htmlIf 503 persists for many hours, crawlers eventually drop the URL from the index. Communicate clearly via Retry-After and aim to recover within minutes, not hours.
Soft errors to avoid
- 200 OK with empty body for missing content. Crawlers may eventually classify as soft-404; signals are unreliable in the meantime.
- 200 OK with "page not found" text — same problem. Return real 404 or 410.
- 301 to homepage for missing pages — dilutes the homepage signal and confuses indexing.
- 302 used for permanent moves — leaves the old URL indexed indefinitely.
AI-crawler-specific notes
- GPTBot, ClaudeBot, and PerplexityBot do not publish detailed retry policies; they appear to honor Retry-After and standard HTTP semantics in practice.
- Aggressive 429 from generic edge rules can starve AI crawlers. Tune rate limits per-bot rather than blanket-throttle.
- 401/403 to AI bots while pages are public is a frequent self-inflicted issue; verify in your WAF or bot-management rules.
Common mistakes
- Returning 200 for soft errors.
- Using 302 for permanent moves.
- Long redirect chains.
- Returning 5xx without Retry-After.
- Blocking AI bots with 403 while debugging and forgetting to remove the rule.
FAQ
Q: Should I prefer 301 or 308 for permanent redirects?
Functionally equivalent for SEO. Use 308 if you must preserve the request method (e.g., POST). 301 is more widely understood and is the safer default for browsers and CDN tooling.
Q: Does 410 hurt rankings?
No. It signals deliberate removal. The URL leaves the index faster, but you do not lose ranking on remaining URLs.
Q: How long can I serve 503 before crawlers de-index?
There is no public threshold. Practitioner observations suggest minutes are safe, hours risk de-indexing for the affected URLs. Always include Retry-After.
Q: Do AI bots like GPTBot honor Retry-After?
They appear to in practice. Without official policies, treat their behavior like Googlebot's: standard HTTP semantics with conservative back-off.
Q: Can I use 451 for AI-rights blocking?
451 is for legal blocks. For AI training/usage opt-out, prefer robots.txt directives and TDM signaling. 451 is an emergency tool.
Q: What about 200 with noindex?
200 + noindex keeps content fetchable but instructs crawlers not to index. AI crawlers vary in their support; for hard removal, 404 or 410 is more reliable.
Related Articles
Conditional GET and ETag Handling for AI Crawlers
Conditional GET and ETag handling for AI crawlers: ETag generation, If-None-Match, If-Modified-Since, 304 Not Modified, and bandwidth-saving patterns.
Edge Rendering Strategy for AI Citation Optimization
Edge rendering strategy for AI citation: Cloudflare Workers vs Vercel Edge vs Netlify Edge, latency targets, cache-key strategy, and content parity rules.
JavaScript SPA Hydration Patterns for AI Crawlers
JavaScript SPA hydration patterns for AI crawlers: rendering modes, mismatch fixes, and framework-specific strategies for GPTBot, ClaudeBot, PerplexityBot.