GEO Citation Attribution Models

GEO citation attribution applies classical marketing attribution models — first-touch, last-touch, U-shape, W-shape, and time-decay — to AI assistant citations. Combining server logs, AI surface scrapes, and survey signals connects citation share to revenue, so the marketing team can defend GEO investment with the same rigor as paid and organic channels.

TL;DR

Attribution for GEO is harder than for paid or classical organic, because AI engines often answer the question without sending a click. Combine three data sources: server access logs (with referrer and AI-bot user-agents), AI surface scrapes (Perplexity, ChatGPT, AI Overviews citation panels), and a self-reported "how did you find us?" question. Apply first-touch, last-touch, U-shape, W-shape, and time-decay models per channel and report a blended citation-influenced revenue number.

Why GEO Attribution Is Hard

Classical attribution depends on a click that lands on your site with a recognizable referrer. AI search breaks that chain in three ways:

Zero-click answers. Many AI assistant answers do not require a click. The user reads the answer, decides, and acts. Server logs do not see the touchpoint at all.
Citation without click. A user may see your domain cited in a Perplexity sources list but click a different source. The citation still influenced the decision.
Mixed referrers. When a click does happen, the referrer can be chatgpt.com, perplexity.ai, google.com/aio, claude.ai, or even empty (mobile in-app browsers). Each engine reports differently.

No attribution model recovers full causality, but the discipline of running multiple models in parallel reduces the bias of any single approach.

Data Sources

A GEO attribution stack pulls from three layers:

1. Server-side signals

Access logs: capture every request with referrer, user-agent, IP. Filter for AI-engine referrers (chatgpt.com, perplexity.ai, claude.ai, gemini.google.com, AI Overview referrers) and AI-bot user-agents (GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, ChatGPT-User, Perplexity-User).
CDN logs / edge logs: same data with better completeness, since some bot traffic skips the origin.
Analytics: classify AI-referred sessions as a first-class channel. GA4 supports custom channel grouping; Mixpanel and Amplitude support session-level referrer rules.

2. AI surface scrapes

Run a fixed query panel weekly across ChatGPT, Perplexity, Claude, Gemini, AI Overviews.
Capture the citation list per query.
Tag each citation as your domain or third-party.
Persist citation events with timestamp, engine, query, source URL, and position.

This gives you the exposure signal even when no click occurs.

3. Self-reported attribution

Add a "how did you hear about us?" question to onboarding, lead forms, or post-purchase surveys.
Include AI-assistant options ("ChatGPT", "Perplexity", "Claude", "AI search summary on Google").
Treat self-report as ground truth where logs and scrapes are silent.

No single source is sufficient. Triangulating all three is what makes GEO attribution defensible.

Attribution Models

Apply the same attribution models used in classical marketing:

First-touch

Credit goes to the first identifiable touchpoint in the customer journey. For GEO, the first touch is often an AI citation exposure or a click from an AI assistant. First-touch overweights discovery channels and is appropriate when measuring net-new demand creation.

Last-touch

Credit goes to the last touchpoint before conversion. AI assistants often appear earlier in the journey than the conversion click, so last-touch tends to undervalue GEO. Useful as a counterweight to first-touch.

U-shape (position-based)

Credits 40% to first touch, 40% to last touch, 20% spread across middle touches. A reasonable default for GEO when AI is often a first-touch channel and paid or direct closes the loop.

W-shape

Adds an opportunity-creation milestone (lead, signup, qualified) and credits 30% each to first touch, opportunity creation, and last touch, with 10% across the middle. Common in B2B SaaS where GEO drives top-of-funnel awareness and a structured journey carries leads to opportunity.

Time-decay

More recent touches get more credit; touches further in the past get exponentially less. A practical default for fast-moving categories where AI citations frequently move within weeks.

Algorithmic / data-driven

Where the data volume permits (typically tens of thousands of conversions), shapley-value or Markov-chain attribution can fit the credit weights from the data itself. Most GEO programs do not yet have the volume; revisit once you do.

Implementation Pattern

Define the conversion event. Lead, signup, demo, purchase. Pick one and stick with it.
Build the journey table. For each conversion, list the chronological touchpoints across paid, organic, AI citations, and direct.
Apply each model in parallel. Compute attribution credits under first-touch, last-touch, U-shape, W-shape, and time-decay.
Aggregate per channel. Roll up to channel-level revenue contribution.
Publish all model views. Show stakeholders the range across models. The honest answer is a range, not a single number.
Re-run weekly or monthly. Models drift as your channel mix and AI engines change.

Reporting Examples

"GEO contributed 8-18% of new pipeline last quarter, depending on attribution model."
"Under U-shape, GEO is now our second-largest top-of-funnel channel after paid search."
"AI citation exposures (no click) precede 32% of self-reported AI-influenced conversions, suggesting we underweight GEO under last-touch."

Common Mistakes

Picking a single model. Each model has a known bias; running multiple in parallel is the only defensible approach.
Ignoring zero-click exposure. A citation without click still influences the buyer; capture it via scrapes and survey.
Treating AI referrer as a single channel. Split ChatGPT, Perplexity, Claude, AI Overviews, Gemini at minimum. They behave differently.
Overfitting to a 30-day window. Many GEO journeys span 60-120 days. Use a window appropriate to your sales cycle.
Confusing exposure with conversion. Exposure measures reach; survey + logs together measure influence.
Building for the C-suite, not the team. Models that team members do not understand will not be acted on. Prefer simpler models (first/last/U) widely understood over complex models nobody uses.

FAQ

Q: Can GA4 attribute AI citations natively?

GA4 sees referrer-based AI traffic only. Citation exposure without a click is invisible. Pair GA4 with citation scrapes and a self-report field for full coverage.

Q: Should I block AI bots to avoid "polluting" my logs?

No. Bot visits in logs are useful signal: they tell you which pages are being read by AI engines. Filter, don't block.

Q: How does this differ from classical SEO attribution?

Classical SEO is mostly click-driven and lands on identifiable referrers. GEO has a large zero-click exposure layer that classical SEO attribution does not have, requiring scrape and survey data to estimate.

Q: How big does the data set need to be for data-driven attribution?

Shapley-value or Markov models typically need tens of thousands of conversions and clean journey data. Most teams should run rule-based models (first/last/U/W/time-decay) until they cross that threshold.

Q: Can I attribute revenue to specific cited URLs on my domain?

Yes, by joining server logs (which page was cited as referrer or visited from AI) with the citation scrape (which page was the source). Use this view to prioritize content investment on the highest-revenue cited pages.

Q: What about offline conversions?

For B2B and high-consideration purchases, an offline conversion event uploaded to your CDP, with an AI-influence flag derived from the survey, integrates cleanly into the attribution model.