Hreflang for Multi-Language AI Citations
Hreflang is an HTML, sitemap, or HTTP-header annotation that tells search engines and AI crawlers which language and region variant of a URL to serve; correct AI citation behavior across languages depends on reciprocal return-tags, an explicit x-default fallback, and valid BCP-47 codes.
TL;DR
Use hreflang to declare every language and region variant of a page. Pair language-only codes with regional codes when needed, set an x-default for unmatched audiences, and ensure every variant returns a tag pointing back to every other variant. Choose one delivery method (HTML, sitemap, or HTTP header) and apply it consistently. AI engines use these annotations to decide which language variant to cite for a user's query.
Definition
Hreflang is a link relation declared with rel="alternate" and an hreflang attribute carrying a BCP-47 language code, optionally followed by a region subtag. It is documented by Google Search Central. Hreflang does not change the content served; it tells crawlers and downstream consumers which variant matches a given language or region.
How AI Engines Use Hreflang
AI engines consume hreflang in two ways. First, when ingesting content, they associate each URL with a language and optional region, which lets the answer engine pick the variant whose language matches the user's query. Second, when generating citations, they prefer the variant the user can read; citing an English page in a French AI Overview reduces user trust and click-through.
Google AI Overviews follow the same hreflang rules as Google Search. Perplexity and ChatGPT browsing read hreflang from HTML and sitemaps when fetching pages and use language signals plus query language detection to choose a variant. Claude with web access also respects HTML language attributes. Missing or inconsistent hreflang causes the wrong-language variant to be cited, which is one of the most common quality issues for international sites in AI surfaces.
Delivery Methods
Hreflang can be delivered through three channels. Pick one and use it consistently.
HTML Elements
Place tags in the
of every variant.<link rel="alternate" hreflang="en" href="https://example.com/en/page" />
<link rel="alternate" hreflang="fr" href="https://example.com/fr/page" />
<link rel="alternate" hreflang="x-default" href="https://example.com/en/page" />XML Sitemap
Declare xhtml:link entries inside each
<url>
<loc>https://example.com/en/page</loc>
<xhtml:link rel="alternate" hreflang="en" href="https://example.com/en/page" />
<xhtml:link rel="alternate" hreflang="fr" href="https://example.com/fr/page" />
<xhtml:link rel="alternate" hreflang="x-default" href="https://example.com/en/page" />
</url>HTTP Link Header
For non-HTML resources (PDFs, JSON APIs that serve documentation), use the HTTP Link header.
Link:
Required Properties
| Property | Required | Notes |
|---|---|---|
| hreflang value | Yes | BCP-47 code, e.g. en, en-US, pt-BR |
| href | Yes | Absolute URL of the variant |
| x-default | Recommended | Fallback for unmatched languages |
| Return-tag reciprocity | Yes | Every variant must declare every other variant |
| Self-reference | Yes | Each page must include an hreflang to itself |
Language vs Region Targeting
Use a language-only code (en, fr, de) when the variant serves all regions speaking that language. Use a language-region code (en-GB, pt-BR, es-MX) when the variant has region-specific content (currency, regulations, examples). The region subtag is uppercase ISO 3166-1 alpha-2.
Do not use region without language. hreflang="US" is invalid and will be ignored. Do not invent codes; AI engines and Google fall back to ignoring annotations they cannot parse.
The x-default Fallback
x-default declares the variant to serve when no language match is possible. It is required for AI citation quality on international sites because it gives the answer engine a deterministic fallback when the user's query language is not in the hreflang set. Common patterns: an English-language landing page that auto-detects, or a language-selection page.
Validation
Validate hreflang through:
- Google Search Console International Targeting report.
- Open-source crawlers (for example, Screaming Frog) that flag missing return-tags and orphan languages.
- Custom CI lint that asserts every variant in the sitemap returns a hreflang to every other variant.
Common Mistakes
- Orphan languages. A variant declared by others but missing return-tags. AI crawlers drop orphan variants from consideration.
- Wrong codes. Using en-UK instead of en-GB, or pt-PT for Brazilian content. AI engines silently ignore invalid codes.
- Mixed canonical and hreflang. Each variant must canonicalize to itself; canonicalizing all variants to the English page collapses them into one URL and erases the language signal.
- Missing self-reference. A page that declares siblings but not itself fails reciprocity checks.
- Inconsistent delivery. Mixing HTML and sitemap hreflang with conflicting values causes unpredictable behavior.
Integration with Translation Pipelines
Declare hreflang at the same point in the build that emits translated URLs. The mapping should be authoritative: a single source of truth (typically the translation management system) outputs both the rendered page and the hreflang manifest. Treat divergence between the manifest and the rendered tags as a build failure.
FAQ
Q: Do AI engines respect hreflang the same way Google does?
In practice yes. Google AI Overviews follow Google Search rules. Perplexity, ChatGPT, and Claude read HTML and sitemap hreflang during ingestion and prefer language-matched variants when citing.
Q: Should every variant point to itself in hreflang?
Yes. Self-reference is required by the Google specification and by AI parsers that derive the variant set from any single page.
Q: Is x-default required?
Not required by the specification, but strongly recommended. Without x-default, AI engines have no deterministic fallback when the user's query language is outside the declared set.
Q: Can I use hreflang with canonical tags?
Yes, and you must. Each language variant must self-canonicalize. Canonicalizing all variants to one page erases the language signal and breaks AI citation language selection.
Related Articles
Canonical Tag for AI Search
Specification for rel=canonical implementation across HTML and HTTP-header methods, with guidance on how AI engines resolve canonicals for parameterized URLs and AMP variants.
ChatGPT Atlas Browser Discoverability Specification
Technical specification for making sites discoverable to ChatGPT Atlas browser: user-agent behavior, fetch semantics, robots.txt directives Atlas honors, and citation eligibility rules.
Link rel=preconnect & dns-prefetch for AI crawlers
How preconnect and dns-prefetch link hints reduce AI crawler latency for asset fetches, third-party endpoints, and citation extraction.