Geodocs.dev

Accept-Language and AI Language Detection

ShareLinkedIn

Open this article in your favorite AI assistant for deeper analysis, summaries, or follow-up questions.

AI crawlers detect language from the page's attribute and from the URL/locale path; most do not honor Accept-Language-based redirects, and many send Accept-Language: en-US by default, which means locale-adaptive sites can accidentally hide non-English pages from AI search.

TL;DR

Declare with a BCP 47 tag on every page, serve one stable URL per locale, and avoid 302 redirects driven by Accept-Language. AI crawlers commonly default to en-US and will be redirected away from your non-English content if your origin keys redirects on the header.

Scope

This specification applies to:

  • Multilingual or locale-adaptive sites whose content varies by language.
  • HTML responses delivered to AI crawler user agents (GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Googlebot, Google-Extended, Applebot-Extended, CCBot).
  • Pages indexed for AI Overviews, ChatGPT search, Perplexity, Claude, and Gemini.

Background signals

AI crawlers and rendering pipelines combine three signals to decide a page's language:

  1. attribute — the W3C-recommended primary signal (e.g., ). MDN documents this as a BCP 47 tag, inherited by descendant elements.
  2. URL pattern — path prefix (/de/), subdomain (de.example.com), or country-code TLD (example.de).
  3. hreflang link relations — declared in or sitemaps, signaling alternate-language URLs.

Accept-Language from the bot is a request signal, not a content signal. Treat it as input to potential negotiation, not as a definitive locale identifier.

Normative requirements

  1. The page MUST declare with a BCP 47 tag.
  2. The site MUST serve a stable URL per locale (path, subdomain, or ccTLD scheme).
  3. The origin MUST NOT issue a 301/302 redirect keyed solely on Accept-Language.
  4. The origin SHOULD include hreflang link relations in of every locale variant.
  5. The origin SHOULD emit Content-Language: on responses where the language is unambiguous.
  6. The origin SHOULD NOT emit Vary: Accept-Language unless the response body genuinely varies by header.
  7. When language varies inline within a page, the page SHOULD wrap foreign-language fragments in .

Crawler behavior matrix

User agentSends Accept-LanguageDefault value (typical)Honors locale redirects?
GooglebotRarely (per Google Search docs)n/aNo (Google does not use it for crawl)
GPTBotYesen-US (observed)Follows; can be misrouted
OAI-SearchBotYesen-US (observed)Follows; can be misrouted
ChatGPT-UserYes (Chromium-derived)Browser-derivedFollows
ClaudeBot / anthropic-aiSometimesen-US when sentFollows when sent
PerplexityBotSometimesen-US when sentFollows when sent
Browser agents (Operator, Browser Use)YesConfigurable per sessionFollows

This matrix reflects practitioner observation as of 2026, including the MERJ R&D report (Mar 2026) on locale-adaptive redirect risk; vendors do not publish formal commitments.

Why locale redirects break AI crawls

A common pattern on multilingual sites is:

GET / HTTP/1.1
Accept-Language: en-US
  -> 302 Location: /en/

GET / HTTP/1.1

Accept-Language: de-DE

-> 302 Location: /de/

AI crawlers that send a default en-US will always be redirected to /en/, regardless of which page they fetched. The /de/ URL appears unreachable to those bots, and German-language citations of your site become impossible.

The fix:

  • Keep /, /en/, /de/, etc. as distinct, directly addressable URLs.
  • Use hreflang to cross-link them.
  • Optionally, on /, render a small language picker rather than auto-redirecting.

Vary handling

Apply Vary: Accept-Language only when the same URL legitimately serves different bodies for different languages — typically a single-page app or API endpoint. For static locale URLs, omit Vary.

Incorrect use of Vary: Accept-Language causes CDN cache fragmentation: every distinct header value (and there are dozens) becomes its own cache key, eroding hit rates without behavioral benefit.

Inline language fragments

For pages that quote or reference content in another language, mark up fragments explicitly:

<p>The French motto is <span lang="fr">liberté, égalité, fraternité</span>.</p>

This helps assistive tech, screen readers, and AI parsers that look for inline lang switches. AI engines that summarize multilingual content benefit from accurate inline tagging when generating attribution.

hreflang reference pattern

<link rel="alternate" hreflang="en" href="https://example.com/en/page">
<link rel="alternate" hreflang="de" href="https://example.com/de/page">
<link rel="alternate" hreflang="fr" href="https://example.com/fr/page">
<link rel="alternate" hreflang="x-default" href="https://example.com/page">

All variants must mutually link to one another, including a self-reference. The x-default value points to the URL served when no locale match is available.

Common mistakes

  • Auto-redirecting from / to /en/ on Accept-Language: en-US. AI crawlers default to en-US; non-English locales become unreachable.
  • Setting (empty string). This explicitly marks the language as unknown and degrades AI parsing confidence.
  • Mixing locale signals: on a page whose URL is /de/ and whose body is German.
  • Forgetting Content-Language on JSON or feed endpoints where there is no element to carry the lang attribute.
  • Adding Vary: Accept-Language to static localized URLs — wastes CDN cache space.
  • Truncating BCP 47 tags: prefer de-DE over de when regional dialect matters; use the shortest tag that disambiguates.

Validation checklist

  • [ ] Every page declares .
  • [ ] Each locale lives at a stable URL (path, subdomain, or ccTLD).
  • [ ] No 301/302 redirects keyed on Accept-Language alone.
  • [ ] hreflang link relations cover all locale variants and include x-default.
  • [ ] Content-Language is set on responses where appropriate.
  • [ ] Vary: Accept-Language is absent on static locale URLs.
  • [ ] Inline foreign-language fragments use .

FAQ

Q: Do AI crawlers use Accept-Language for ranking?

No direct ranking effect has been documented. The header affects which URL the crawler ends up fetching when an origin redirects on it; that, in turn, affects which content is indexed. The fix is structural (one URL per locale), not header-based.

Q: Should I set Accept-Language to * to allow any locale?

The site does not set Accept-Language — the bot does. The origin's job is to honor the structural URL the bot requested without redirecting based on the header.

Q: What about Googlebot's locale-aware crawling?

Google published locale-adaptive crawl support in 2015, but Google's modern guidance is to serve distinct URLs per locale with hreflang. AI crawlers other than Googlebot largely lack a locale-aware crawl mode.

Q: Is Content-Language redundant with ?

Not quite. is a per-document declaration. Content-Language is the HTTP-level signal and applies to non-HTML responses (JSON, plaintext, feeds) where there is no element. Set both for HTML responses.

Q: Do I need hreflang if my URLs are already locale-prefixed?

Yes. hreflang provides explicit cross-references between locale variants, which AI crawlers and search engines use to confirm the alternate-locale graph. Without it, an AI engine may treat /en/page and /de/page as duplicates rather than locale alternates.

Related Articles

specification

Accept-Encoding (Brotli, Gzip) for AI Crawlers

Specification for serving Brotli, gzip, and zstd to AI crawlers via Accept-Encoding negotiation: which bots support which codecs, fallback rules, and Vary handling.

specification

AI Crawler Content Negotiation Specification

HTTP content negotiation (Accept, Accept-Language, Vary) for AI crawlers — serve LLM-friendly variants without cloaking penalties or cache poisoning.

reference

DNS Prefetch and Preconnect for AI Crawlers

Reference for using dns-prefetch and preconnect resource hints with AI crawlers and browser agents: semantics, ordering, and impact on render-stage crawls.

Topics
Cập nhật tin tức

Thông tin GEO & AI Search

Bài viết mới, cập nhật khung làm việc và phân tích ngành. Không spam, hủy đăng ký bất cứ lúc nào.