HTTP Status Code Reference for AI Crawlers

AI search crawlers (Googlebot, GPTBot, ClaudeBot, PerplexityBot) honor standard HTTP semantics. Use 301/308 for permanent moves, 302/307 for temporary, 410 for intentional removals, and 429/503 with Retry-After for graceful throttling without losing index inclusion.

TL;DR

Return the most specific status code for each situation. Permanent move → 301 or 308. Temporary → 302 or 307. Permanently gone → 410. Soft 404s (200 with empty content) confuse every crawler and should be eliminated. Use 429 + Retry-After for rate limiting, 503 + Retry-After for maintenance. Avoid generic 500s and 200-with-error-body patterns.

Status code reference matrix

Code	Meaning	AI-crawler effect	Use when
200 OK	Success	Indexed	Page exists with content
201 Created	Created	Not generally crawled	API responses
204 No Content	Empty	Treated like 200 with no body	Avoid for content URLs
301 Moved Permanently	Permanent move	Canonical replaced; signals consolidate	Permanent URL change
302 Found	Temporary	Original URL kept indexed; no consolidation	Temporary detour
303 See Other	Redirect after POST	Treated as 302 by crawlers	API patterns
304 Not Modified	Conditional cache hit	Crawler reuses cached copy	Conditional GET (see ETag)
307 Temporary Redirect	Temporary; method preserved	Treated as 302 for indexing	HTTP → HTTPS redirects, etc.
308 Permanent Redirect	Permanent; method preserved	Treated as 301 for indexing	Permanent move (POST safe)
400 Bad Request	Client error	Page dropped if persistent	Truly bad input
401 Unauthorized	Auth required	Page treated as inaccessible	Truly auth-only content
403 Forbidden	Forbidden	Persistent 403 leads to drop	Block path you do not want indexed
404 Not Found	Missing	Slow decay; rechecked occasionally	Page does not exist
410 Gone	Intentionally removed	Faster removal than 404	Permanently retired URLs
429 Too Many Requests	Rate limited	Crawler backs off; honors Retry-After	Protect origin
451 Unavailable For Legal Reasons	Legal block	Indexed status frozen	Compliance removals
500 Internal Server Error	Server bug	Crawler retries; persistent 500 hurts	Avoid — fix upstream
502 Bad Gateway	Upstream failed	Crawler retries; persistent harms ranking	Transient gateway issue
503 Service Unavailable	Maintenance/overload	Honors Retry-After; safe for short windows	Planned maintenance
504 Gateway Timeout	Upstream timeout	Crawler retries	Upstream timeout

Redirect rules

Use 301 or 308 when a URL change is permanent. Both signal canonical replacement to Google and AI crawlers (Google Search Central).
Use 302 or 307 for temporary detours. The original URL stays indexed; PageRank-style signals do not consolidate.
Avoid chaining redirects more than once; AI crawlers may abandon long chains.
Avoid mixed schemes inside a chain (HTTP to HTTPS to canonical) — collapse to a single hop.
JavaScript and meta-refresh redirects work for Googlebot but are unreliable for non-Google AI crawlers. Prefer server-side 301/308.

Removal rules: 404 vs 410

404 Not Found — the URL is missing but might come back. Crawlers re-check on a slow cadence.
410 Gone — the URL is intentionally and permanently retired. Removal from the index is faster than 404.

Use 410 when you are sure the URL will not return (deleted product, sunset article). Reserve 404 for genuinely-missing-or-might-return cases.

Throttling: 429 and 503

Both signal "come back later." Always pair with the Retry-After header.

429 Too Many Requests — the requester (specific IP / user-agent) exceeded a rate limit. AI crawlers respect this and reduce concurrency.
503 Service Unavailable — the service itself is down or overloaded. Use for short, planned maintenance windows.

HTTP/1.1 503 Service Unavailable
Retry-After: 600
Content-Type: text/html

If 503 persists for many hours, crawlers eventually drop the URL from the index. Communicate clearly via Retry-After and aim to recover within minutes, not hours.

Soft errors to avoid

200 OK with empty body for missing content. Crawlers may eventually classify as soft-404; signals are unreliable in the meantime.
200 OK with "page not found" text — same problem. Return real 404 or 410.
301 to homepage for missing pages — dilutes the homepage signal and confuses indexing.
302 used for permanent moves — leaves the old URL indexed indefinitely.

AI-crawler-specific notes

GPTBot, ClaudeBot, and PerplexityBot do not publish detailed retry policies; they appear to honor Retry-After and standard HTTP semantics in practice.
Aggressive 429 from generic edge rules can starve AI crawlers. Tune rate limits per-bot rather than blanket-throttle.
401/403 to AI bots while pages are public is a frequent self-inflicted issue; verify in your WAF or bot-management rules.

Common mistakes

Returning 200 for soft errors.
Using 302 for permanent moves.
Long redirect chains.
Returning 5xx without Retry-After.
Blocking AI bots with 403 while debugging and forgetting to remove the rule.

FAQ

Q: Should I prefer 301 or 308 for permanent redirects?

Functionally equivalent for SEO. Use 308 if you must preserve the request method (e.g., POST). 301 is more widely understood and is the safer default for browsers and CDN tooling.

Q: Does 410 hurt rankings?

No. It signals deliberate removal. The URL leaves the index faster, but you do not lose ranking on remaining URLs.

Q: How long can I serve 503 before crawlers de-index?

There is no public threshold. Practitioner observations suggest minutes are safe, hours risk de-indexing for the affected URLs. Always include Retry-After.

Q: Do AI bots like GPTBot honor Retry-After?

They appear to in practice. Without official policies, treat their behavior like Googlebot's: standard HTTP semantics with conservative back-off.

Q: Can I use 451 for AI-rights blocking?

451 is for legal blocks. For AI training/usage opt-out, prefer robots.txt directives and TDM signaling. 451 is an emergency tool.

Q: What about 200 with noindex?

200 + noindex keeps content fetchable but instructs crawlers not to index. AI crawlers vary in their support; for hard removal, 404 or 410 is more reliable.