Geodocs.dev

HTTP Status Code Reference for AI Crawlers

ShareLinkedIn

Open this article in your favorite AI assistant for deeper analysis, summaries, or follow-up questions.

AI search crawlers (Googlebot, GPTBot, ClaudeBot, PerplexityBot) honor standard HTTP semantics. Use 301/308 for permanent moves, 302/307 for temporary, 410 for intentional removals, and 429/503 with Retry-After for graceful throttling without losing index inclusion.

TL;DR

Return the most specific status code for each situation. Permanent move → 301 or 308. Temporary → 302 or 307. Permanently gone → 410. Soft 404s (200 with empty content) confuse every crawler and should be eliminated. Use 429 + Retry-After for rate limiting, 503 + Retry-After for maintenance. Avoid generic 500s and 200-with-error-body patterns.

Status code reference matrix

CodeMeaningAI-crawler effectUse when
200 OKSuccessIndexedPage exists with content
201 CreatedCreatedNot generally crawledAPI responses
204 No ContentEmptyTreated like 200 with no bodyAvoid for content URLs
301 Moved PermanentlyPermanent moveCanonical replaced; signals consolidatePermanent URL change
302 FoundTemporaryOriginal URL kept indexed; no consolidationTemporary detour
303 See OtherRedirect after POSTTreated as 302 by crawlersAPI patterns
304 Not ModifiedConditional cache hitCrawler reuses cached copyConditional GET (see ETag)
307 Temporary RedirectTemporary; method preservedTreated as 302 for indexingHTTP → HTTPS redirects, etc.
308 Permanent RedirectPermanent; method preservedTreated as 301 for indexingPermanent move (POST safe)
400 Bad RequestClient errorPage dropped if persistentTruly bad input
401 UnauthorizedAuth requiredPage treated as inaccessibleTruly auth-only content
403 ForbiddenForbiddenPersistent 403 leads to dropBlock path you do not want indexed
404 Not FoundMissingSlow decay; rechecked occasionallyPage does not exist
410 GoneIntentionally removedFaster removal than 404Permanently retired URLs
429 Too Many RequestsRate limitedCrawler backs off; honors Retry-AfterProtect origin
451 Unavailable For Legal ReasonsLegal blockIndexed status frozenCompliance removals
500 Internal Server ErrorServer bugCrawler retries; persistent 500 hurtsAvoid — fix upstream
502 Bad GatewayUpstream failedCrawler retries; persistent harms rankingTransient gateway issue
503 Service UnavailableMaintenance/overloadHonors Retry-After; safe for short windowsPlanned maintenance
504 Gateway TimeoutUpstream timeoutCrawler retriesUpstream timeout

Redirect rules

  • Use 301 or 308 when a URL change is permanent. Both signal canonical replacement to Google and AI crawlers (Google Search Central).
  • Use 302 or 307 for temporary detours. The original URL stays indexed; PageRank-style signals do not consolidate.
  • Avoid chaining redirects more than once; AI crawlers may abandon long chains.
  • Avoid mixed schemes inside a chain (HTTP to HTTPS to canonical) — collapse to a single hop.
  • JavaScript and meta-refresh redirects work for Googlebot but are unreliable for non-Google AI crawlers. Prefer server-side 301/308.

Removal rules: 404 vs 410

  • 404 Not Found — the URL is missing but might come back. Crawlers re-check on a slow cadence.
  • 410 Gone — the URL is intentionally and permanently retired. Removal from the index is faster than 404.

Use 410 when you are sure the URL will not return (deleted product, sunset article). Reserve 404 for genuinely-missing-or-might-return cases.

Throttling: 429 and 503

Both signal "come back later." Always pair with the Retry-After header.

  • 429 Too Many Requests — the requester (specific IP / user-agent) exceeded a rate limit. AI crawlers respect this and reduce concurrency.
  • 503 Service Unavailable — the service itself is down or overloaded. Use for short, planned maintenance windows.
HTTP/1.1 503 Service Unavailable
Retry-After: 600
Content-Type: text/html

If 503 persists for many hours, crawlers eventually drop the URL from the index. Communicate clearly via Retry-After and aim to recover within minutes, not hours.

Soft errors to avoid

  • 200 OK with empty body for missing content. Crawlers may eventually classify as soft-404; signals are unreliable in the meantime.
  • 200 OK with "page not found" text — same problem. Return real 404 or 410.
  • 301 to homepage for missing pages — dilutes the homepage signal and confuses indexing.
  • 302 used for permanent moves — leaves the old URL indexed indefinitely.

AI-crawler-specific notes

  • GPTBot, ClaudeBot, and PerplexityBot do not publish detailed retry policies; they appear to honor Retry-After and standard HTTP semantics in practice.
  • Aggressive 429 from generic edge rules can starve AI crawlers. Tune rate limits per-bot rather than blanket-throttle.
  • 401/403 to AI bots while pages are public is a frequent self-inflicted issue; verify in your WAF or bot-management rules.

Common mistakes

  • Returning 200 for soft errors.
  • Using 302 for permanent moves.
  • Long redirect chains.
  • Returning 5xx without Retry-After.
  • Blocking AI bots with 403 while debugging and forgetting to remove the rule.

FAQ

Q: Should I prefer 301 or 308 for permanent redirects?

Functionally equivalent for SEO. Use 308 if you must preserve the request method (e.g., POST). 301 is more widely understood and is the safer default for browsers and CDN tooling.

Q: Does 410 hurt rankings?

No. It signals deliberate removal. The URL leaves the index faster, but you do not lose ranking on remaining URLs.

Q: How long can I serve 503 before crawlers de-index?

There is no public threshold. Practitioner observations suggest minutes are safe, hours risk de-indexing for the affected URLs. Always include Retry-After.

Q: Do AI bots like GPTBot honor Retry-After?

They appear to in practice. Without official policies, treat their behavior like Googlebot's: standard HTTP semantics with conservative back-off.

Q: Can I use 451 for AI-rights blocking?

451 is for legal blocks. For AI training/usage opt-out, prefer robots.txt directives and TDM signaling. 451 is an emergency tool.

Q: What about 200 with noindex?

200 + noindex keeps content fetchable but instructs crawlers not to index. AI crawlers vary in their support; for hard removal, 404 or 410 is more reliable.

Related Articles

reference

Conditional GET and ETag Handling for AI Crawlers

Conditional GET and ETag handling for AI crawlers: ETag generation, If-None-Match, If-Modified-Since, 304 Not Modified, and bandwidth-saving patterns.

guide

Edge Rendering Strategy for AI Citation Optimization

Edge rendering strategy for AI citation: Cloudflare Workers vs Vercel Edge vs Netlify Edge, latency targets, cache-key strategy, and content parity rules.

guide

JavaScript SPA Hydration Patterns for AI Crawlers

JavaScript SPA hydration patterns for AI crawlers: rendering modes, mismatch fixes, and framework-specific strategies for GPTBot, ClaudeBot, PerplexityBot.

Topics
Stay Updated

GEO & AI Search Insights

New articles, framework updates, and industry analysis. No spam, unsubscribe anytime.