Geodocs.dev

AI Search Referrer Attribution Reference Specification

ShareLinkedIn

Open this article in your favorite AI assistant for deeper analysis, summaries, or follow-up questions.

This specification defines how analytics systems should detect, label, and attribute referral traffic from AI search engines. It enumerates the HTTP Referer hosts each major LLM surface emits, the UTM conventions analytics teams should standardize on, and a deterministic decision tree for reconciling sessions that arrive without a referrer header.

TL;DR. AI search clients pass referrer data inconsistently: Perplexity reliably sends perplexity.ai, ChatGPT sometimes sends chatgpt.com or chat.openai.com, Gemini sometimes sends gemini.google.com, and Google AI Overviews pass google.com indistinguishably from organic. Reliable attribution requires (1) a host allow-list, (2) a standardized UTM scheme on every link you control, and (3) a GA4 custom channel group that catches the long tail. Treat the result as a conservative lower bound — copy-paste citations are intrinsically unattributable.

This document is a normative reference. It defines field shapes, host patterns, and decision rules. For step-by-step setup, see the linked tutorials in Related references.

1. Scope and definitions

AI search referrer attribution is the process of identifying inbound web sessions whose origin was an AI-mediated answer surface (chat assistants, answer engines, AI browsers, or AI Overviews) and assigning them to a canonical channel and source.

TermDefinition
AI search surfaceAny user-facing product whose primary output is a synthesized answer with optional source citations (ChatGPT, Perplexity, Gemini, Copilot, Claude, You.com, Brave Leo).
AI browserA browser whose default new-tab or address-bar surface is an LLM (ChatGPT Atlas, Perplexity Comet, Arc Search).
Click-throughA user-initiated navigation from a citation chip, link, or button inside an AI surface to a destination URL.
Citation impressionThe act of an AI surface displaying a source link without the user clicking it. Not directly attributable from referrer headers alone.
Dark funnelSessions influenced by an AI surface but recorded as Direct, Organic, or (not set) due to missing referrer or UTM data.

Out of scope: training-data attribution, crawler/bot identification (covered in the crawler user-agent reference), and conversion modeling.

2. Why AI traffic is hard to attribute

Three wire-level facts drive the entire spec:

  1. HTTP Referer is optional and often stripped. AI clients that open links in an in-app webview, native app, or sandbox frequently omit the header or rewrite it. When this happens, GA4 records the session as Direct / (not set).
  2. AI Overviews pass google.com. Clicks from a Google AI Overview citation present an HTTP Referer identical to a normal organic SERP click. There is currently no public referrer field that distinguishes them.
  3. Copy-paste is invisible. When a user reads a Perplexity or ChatGPT answer, copies the destination URL, and pastes it into a new tab, no referrer is sent. This is structurally unattributable.

The result: referrer-only attribution is necessary but not sufficient. A complete implementation layers referrer detection, UTM conventions, and channel grouping.

3. Normative referrer host registry

Attribution systems MUST treat the following hosts as AI-search referrers when they appear in document.referrer or the GA4 session_source dimension. Hosts are listed as registrable domains; subdomains MUST match.

3.1 First-party AI assistants

SurfaceReferrer hostsReliabilityNotes
ChatGPT (web)chatgpt.com, chat.openai.comMediumSent for clickable citations on chatgpt.com; suppressed when opened via in-app link handlers.
ChatGPT Atlas browser(not set) or chatgpt.comLowInternal webviews strip the Referer; treat as Direct fallback.
Perplexity (web)perplexity.ai, www.perplexity.aiHighMost reliable AI source; consistently passes referrer on citation clicks.
Perplexity Cometperplexity.ai or (not set)MediumBehavior depends on whether the user clicks a citation versus types a URL.
Google Geminigemini.google.comMediumSent for explicit citation clicks; native Android Assistant surface often omits.
Microsoft Copilotcopilot.microsoft.com, bing.comMediumBing-Copilot blended sessions sometimes attribute to bing.com.
Claude (Anthropic)claude.aiLowMost external links open in a new tab without referrer.
You.comyou.comMediumReliable when the user clicks the citation card.
Brave Leo(not set)Very lowIn-browser sidebar; no referrer in current builds.
Meta AImeta.aiLowInconsistent across surfaces.

3.2 AI Overviews and SGE

Clicks from Google AI Overviews carry google.com (or country variants) as the referrer, identical to organic SERP clicks. Implementations MUST NOT classify google.com referrals as AI-attributed. AI Overview attribution requires a separate measurement model (branded-search lift, view-through, or first-party citation telemetry) and is out of scope for referrer-based detection.

3.3 Aggregators that proxy AI surfaces

Aggregator hosts (poe.com, phind.com, kagi.com, huggingface.co/chat, t3.chat) MUST be treated as AI-search referrers when present.

For every link an organization controls (citations seeded into prompts, plugin output, structured data, RAG sources), apply a deterministic UTM scheme so attribution survives referrer loss.

utm_source = // chatgpt | perplexity | gemini | copilot | claude

utm_medium = ai_search // fixed literal

utm_campaign = // e.g. geo-citation-readiness

utm_content = // e.g. ai-search-referrer-attribution-spec

utm_term = // optional, opaque hash of seeded prompt

Rules:

  • utm_medium MUST be the literal string ai_search. This single token is the join key for channel grouping.
  • utm_source values MUST be lowercase, hyphenated, and drawn from the controlled vocabulary in §3.1.
  • Implementations MUST preserve UTMs through canonical redirects and CDN edge rules.

5. GA4 channel grouping ruleset

The RECOMMENDED GA4 custom channel group, ordered by precedence, is:

  1. AI Search — Paid surface — session_medium matches ai_search AND session_campaign contains paid.
  2. AI Search — Cited — session_source matches the regex below OR session_medium equals ai_search.
  3. AI Search — Suspected (dark funnel) — session_source is (direct) AND landing page is in the AI-cited URL set AND time-of-day or geographic anomaly score exceeds threshold (heuristic; mark with inferred=true).
  4. Organic Search — fall-through.

Reference regex for rule 2:

regex

^(chatgpt\.com|chat\.openai\.com|perplexity\.ai|www\.perplexity\.ai|gemini\.google\.com|copilot\.microsoft\.com|claude\.ai|you\.com|poe\.com|phind\.com|kagi\.com|meta\.ai|t3\.chat)$

Implementations MUST place AI Search rules above Organic Search and Referral so that bing.com Copilot sessions are not absorbed into Organic.

6. Detection decision tree

For each inbound session, evaluate in order and stop at the first match:

  1. If utm_medium = ai_search → assign AI Search / .
  2. Else if document.referrer host matches the regex in §5 → assign AI Search / .
  3. Else if document.referrer host is google.com AND landing page is flagged as AI-Overview-cited in your monitoring tool → assign AI Overviews (inferred).
  4. Else if referrer is empty AND client_id is new AND landing path is in the AI-cited URL set within a 7-day citation freshness window → assign AI Search (inferred) with inferred=true.
  5. Else fall through to standard channel grouping.

Rules 3 and 4 are inferred and MUST be flagged so downstream models can apply confidence weighting.

7. Field reference

FieldTypeRequiredDescription
ai_surfaceenum (§3.1)yesThe AI product attributed to the session.
ai_attribution_methodenum: utmreferrerinferred_overviewinferred_darkyesHow the attribution was derived.
ai_confidencefloat 0.0-1.0yes1.0 for utm, 0.8 for referrer, 0.5 for inferred_overview, 0.3 for inferred_dark.
ai_citation_urlstringoptionalDestination URL recorded in the AI surface, when known.
ai_prompt_intent_hashstringoptionalOpaque hash from utm_term, for cohorting.
ai_first_seen_atISO-8601yesFirst session timestamp on this client_id from any AI surface.

Downstream attribution models SHOULD multiply pipeline credit by ai_confidence to avoid over-counting inferred sessions.

8. Conformance levels

  • Level 1 — Detect. Implements §3 host registry and §5 channel grouping. Sufficient for executive dashboards.
  • Level 2 — Tag. Adds §4 UTM conventions on all controlled surfaces. Sufficient for content-level ROI reporting.
  • Level 3 — Reconcile. Adds §6 inferred-attribution rules with confidence flags and a citation freshness index. Required for pipeline-level attribution.

Compliance claims MUST cite the level achieved.

9. Misconceptions

  • "GA4 has built-in AI channels." It does not. The default channel group routes most AI surfaces into Referral or Direct. A custom channel group is required.
  • "bing.com is always Copilot." It is not. Bing organic and Copilot share the host. Use UTMs or page-path heuristics to disambiguate.
  • "Perplexity referrers are 100% reliable." They are the most reliable, not perfect. Mobile app and Comet sessions can still strip the referrer.
  • "AI Overviews can be isolated from referrer alone." They cannot. A separate measurement model is required.

10. FAQ

Q: Does Perplexity send a referrer header?

Yes. Perplexity is currently the most reliable AI source for referrer-based attribution; clicks from citation chips on perplexity.ai consistently include the Referer header. Native app and AI-browser sessions are less reliable.

Q: Why does ChatGPT traffic show up as Direct in GA4?

Because many ChatGPT click paths (mobile app, Atlas browser, in-app webviews) suppress the Referer header. To recover those sessions, tag any links you control with utm_medium=ai_search and add a GA4 custom channel group as in §5.

Q: Can I attribute clicks from Google AI Overviews?

Not from referrer headers — AI Overview clicks pass google.com exactly like organic. Use a separate measurement model: branded-search lift, citation monitoring tools, or first-party telemetry that detects AI-Overview-driven landing pages.

Q: What utm_medium value should I standardize on?

Use the literal string ai_search. A single, consistent token makes channel grouping, BigQuery joins, and cross-platform reporting deterministic.

Q: How should I weight inferred AI traffic?

Multiply pipeline credit by the ai_confidence value defined in §7. Inferred dark-funnel sessions (0.3) should not be combined with deterministic UTM sessions (1.0) without weighting, or you will overstate AI impact.

Related Articles

reference

AI Answer Length Patterns: Word and Token Targets per Engine in 2026

Reference for AI answer lengths in 2026 — word and token targets for ChatGPT, Perplexity, and Google AI Overviews so writers format extractable answers.

framework

AI Citation Confidence Scoring Framework: Predicting Source Inclusion Likelihood

AI citation confidence scoring framework: a predictive model that scores how likely generative engines are to cite a source based on retrieval, grounding, and trust signals.

specification

AI Citation Format Specification by Engine: How ChatGPT, Perplexity, Gemini, and Claude Render Sources in 2026

Reference specification of how ChatGPT, Perplexity, Gemini, and Claude render source citations in 2026, with format patterns, anchor text, and rendering rules.

Stay Updated

GEO & AI Search Insights

New articles, framework updates, and industry analysis. No spam, unsubscribe anytime.