Position-Zero Citation Tracking Specification for AI Search

This specification defines a defensible methodology for tracking position-zero citations across Google AI Overviews, Perplexity, ChatGPT Search, and Bing Copilot, including per-platform definitions, query sampling design, capture stack, storage schema, and alerting thresholds.

TL;DR

Position-zero citation tracking measures whether your URLs appear as cited sources inside AI-generated answers — not whether your brand is mentioned in text. A defensible spec needs a stratified question set (50-500 queries), API-first capture with headless fallback, normalized JSON storage, weekly cadence with daily spot-checks for high-value queries, and alerting on citation-share drops greater than 20 percent week-over-week.

1. Definitions per platform

Position-zero is platform-specific. A single normalization layer must reconcile these differences before reporting.

Google AI Overviews (AIO): the cited source list rendered above the blue links. Track URL, position in the source carousel, and whether the source is expanded by default. AIOs trigger on 9.5 percent of single-word queries and 46.4 percent of seven-plus-word queries (Ahrefs, 2024).
Perplexity: numbered inline citations [1], [2] in the answer plus the explicit "Sources" tray. Track citation index, domain, and whether the source is referenced in the visible answer text.
ChatGPT Search: inline link annotations and the right-rail source list. Track domain, link surface (inline vs. rail), and whether the source is invoked by a follow-up turn.
Bing Copilot: numbered superscript citations plus the "Learn more" footer. Track citation number, domain, and answer-side anchor sentence.

2. Query sampling design

A tracker is only as defensible as its question set. Avoid keyword bags; track natural-language questions that represent buyer intent.

Set size: start with 50 questions for a single product line; scale to 200-500 for multi-line catalogs. Below 50, weekly variance overwhelms signal.
Stratification: divide the set into intent strata (definition, comparison, troubleshooting, pricing, decision) at fixed proportions, e.g. 30/25/15/15/15. Re-stratify quarterly.
Geo and language: rotate through a fixed locale matrix (US-en, UK-en, DE-de, etc.) per query. AIOs vary by country and time, so global averages mask large local swings.
Refresh cadence: retire queries with zero citations across all tracked domains for eight consecutive weeks; replace from a backlog so total set size is constant.

3. Capture methodology

Use API-first capture and reserve headless browsers for platforms without a stable extraction API.

AIO capture: prefer SERP APIs that expose the AI Overview block (e.g. SerpApi's Google AI Overview API, DataForSEO). Backstop with Ahrefs Brand Radar for citation-share roll-ups.
Perplexity capture: the Perplexity Search API returns ranked sources directly; pair with periodic UI scrapes to validate.
ChatGPT and Copilot capture: no official citation API as of 2026-05; use a managed headless stack (Playwright + residential proxy pool) with deterministic prompts and fixed cookies disabled.
Headless config: set viewport to 1366x768, disable JavaScript timeouts under five seconds, retry on detection challenges with exponential backoff, and record raw HTML alongside parsed JSON for replay.
Determinism: strip session continuity (no logged-in profiles, no memory). Run each query at least twice within a 30-minute window and store both responses; AI answers are non-deterministic and a single capture is not evidence.

4. Sample size for statistical significance

For citation-share comparisons across two periods, treat each query as a Bernoulli trial (cited / not cited). To detect a 5 percentage-point change at 80 percent power and α=0.05, you need roughly 300 paired observations. Two captures per query per week across 200 queries delivers 400 weekly observations — enough to flag meaningful weekly drift, not enough for sub-segment cuts. Plan for 500-query coverage if you need cuts by intent stratum.

5. Storage schema

Normalize to a single row per (query, platform, capture) tuple. Suggested columns:

{
  "capture_id": "uuid",
  "query": "string",
  "intent_stratum": "definition|comparison|troubleshooting|pricing|decision",
  "locale": "string (BCP-47)",
  "platform": "aio|perplexity|chatgpt|copilot",
  "captured_at": "timestamp utc",
  "answer_text": "string",
  "citations": [{"position": 1, "url": "string", "domain": "string", "anchor_text": "string"}],
  "raw_html_path": "string (s3 uri)",
  "capture_method": "api|headless",
  "replicate_id": "int (1 or 2)"
}

Derive citation-share, position-mean, and visibility metrics in a downstream dbt or SQL view rather than at write time so historical recomputation is cheap.

6. Reference architecture

flowchart LR
    A["Question set\n(intent strata)"] --> B["Scheduler\n(cron / Airflow)"]
    B --> C["API workers\n(SerpApi, Perplexity API)"]
    B --> D["Headless workers\n(Playwright + proxies)"]
    C --> E["Capture lake\n(raw JSON + HTML in S3)"]
    D --> E
    E --> F["Normalizer\n(per-platform parsers)"]
    F --> G["Warehouse\n(captures table)"]
    G --> H["Metrics views\n(citation share, position)"]
    H --> I["Alerts\n(Slack, email)"]
    H --> J["Reporting\n(dashboards)"]

7. Reporting cadence and alerting

Daily: spot-check the top 20 queries by business value; alert if any tracked domain exits the cited list two days in a row.
Weekly: compute citation share by platform, by intent stratum, and by locale. Front-loading research suggests 55 percent of AIO citations come from the first 30 percent of a page (CXL, 2025) — flag pages whose citation drops correlate with on-page edits below the fold.
Monthly: competitor delta report and intent-stratum report; rotate the question set per the refresh cadence above.
Alerting thresholds: week-over-week citation-share drop greater than 20 percent (relative) on any platform, or absolute drop greater than 5 percentage points on a single high-value query, opens a P2 ticket.

8. Common implementation mistakes

Tracking presence (brand mentioned in text) and citations (URL linked as source) as one metric. They diverge — Perplexity often names brands without citing them.
Single-capture-per-query reporting. AI answers are non-deterministic; one capture is anecdote, not measurement.
Ignoring locale. Global averages mask large in-market swings.
Mixing logged-in and logged-out captures in the same series. Personalization corrupts the time series.

FAQ

Q: How often should we refresh the question set?

Replace queries that have produced zero citations across all tracked domains for eight consecutive weeks. Keep total set size constant. Re-stratify by intent quarterly.

Q: API capture vs. headless — which should we prefer?

API-first whenever a stable extraction endpoint exists (AIO via SerpApi, Perplexity Search API). Reserve headless for platforms without a citation API (ChatGPT Search, Copilot) and treat headless captures as higher-variance.

Q: How many queries do we need for defensible weekly reporting?

A stratified set of 200 queries with two replicates each yields roughly 400 weekly observations per platform — enough to detect a 5-point citation-share change at 80 percent power. Scale to 500 if you need sub-segment cuts.

Q: Should we store raw HTML in addition to parsed JSON?

Yes. Parser logic changes; raw HTML lets you reprocess history without recapturing. Store in object storage with a path referenced from the captures table.