Geodocs.dev

AI Visibility Report Schema Specification: Standardized Citation Data Export Format Across Vendors

ShareLinkedIn

Open this article in your favorite AI assistant for deeper analysis, summaries, or follow-up questions.

This specification defines a vendor-neutral schema for AI visibility reports so agencies can normalize exports from Profound, Peec AI, Otterly, HubSpot AEO, and similar tools into a single dataset. It covers an 18-field required record, recommended JSON Lines and CSV serializations, and ingestion guidance for Looker Studio, BigQuery, Snowflake, Notion, and Airtable.

TL;DR

Every AI visibility platform exports the same five core dimensions — prompt, engine, citation, position, share of voice — but each vendor uses a different column layout. This spec freezes a canonical column set (18 required + 14 optional fields) and a JSON Lines payload so analysts can union exports, join to revenue, and build cross-vendor dashboards without rewriting glue code each time a vendor renames a column. Use it as the target schema in your ETL layer.

Why a vendor-neutral schema is needed

Agencies and in-house RevOps teams now subscribe to two or three AI visibility tools at once. Profound publishes raw exports plus an API, Peec AI ships CSVs and a Looker Studio community connector, Otterly emails CSV reports, and HubSpot AEO sits inside the Marketing Hub with its own data model. Practitioners on r/b2bmarketing call out the duct-tape problem directly: CSVs land in different shapes, columns drift between releases, and there is no standard interchange format an analyst can target.

Without a shared schema:

  • Cross-vendor share of voice cannot be reconciled — each tool defines "mention" differently.
  • Joining citations to GA4 referral or CRM revenue requires per-vendor SQL transforms.
  • BI dashboards (Looker Studio, Tableau, Power BI) break every time a vendor adds or renames a column.
  • Notion or Airtable rollups stay manual because import schemas have to be redefined per file.

This specification is the missing target schema. It is intentionally narrow: only fields that at least two of {Profound, Peec, Otterly, HubSpot AEO, Brandlight, Rankability} already export are required.

How it works

Each visibility tool runs a fixed prompt against an answer engine on a schedule, captures the rendered answer, and parses the citations and brand mentions out of it. The resulting row describes one prompt-engine-run combination. This spec models that row as a single JSON object (or CSV row) called a VisibilityObservation. A complete export is an array of VisibilityObservation records plus a small ReportMeta envelope.

Implementers transform their native export to this schema in the ETL layer, then load into a warehouse table or BI source.

Key concepts

VisibilityObservation

The atomic record. One observation per (prompt, engine, run_at) tuple. Citations and mentions inside the answer expand into nested arrays rather than separate rows so a single answer rendering remains traceable.

ReportMeta

A single-object envelope that wraps the observation array and records vendor, schema version, project identifier, and time window. Required so consumers can reject incompatible exports.

Share of voice

Defined here as the percentage of monitored prompts in which the brand appears as a citation or named mention, scoped to a single engine. Vendors compute this at the project or competitor level; this spec standardizes the denominator (prompts_in_run) so aggregations agree.

Schema definition

ReportMeta (required envelope)

FieldTypeRequiredDescription
schema_versionstring (semver)YesMust be "1.0" for conformance with this spec.
vendorstringYesSource platform (e.g. "profound", "peec", "otterly", "hubspot-aeo").
project_idstringYesStable project or workspace identifier from the vendor.
brandstringYesPrimary tracked brand name.
competitorsarray of stringNoTracked competitor brands.
window_startISO 8601 datetimeYesInclusive start of the reporting window in UTC.
window_endISO 8601 datetimeYesExclusive end of the reporting window in UTC.
generated_atISO 8601 datetimeYesWhen the export was produced.
currencyISO 4217 codeNoUsed only when monetary fields are populated.

VisibilityObservation (required fields)

FieldTypeDescription
observation_idstring (UUID v4)Globally unique row identifier.
run_idstringStable identifier for the batch run that produced this observation.
run_atISO 8601 datetimeWhen the prompt was executed against the engine, in UTC.
engineenumOne of chatgpt, perplexity, gemini, google-ai-overviews, google-ai-mode, copilot, claude, meta-ai, grok, deepseek.
engine_modelstringModel variant when known (e.g. gpt-5, sonar-large, gemini-2.5-pro).
prompt_idstringStable identifier for the monitored prompt.
prompt_textstringVerbatim prompt text.
prompt_localeBCP 47 tagLocale used when issuing the prompt (e.g. en-US, de-DE).
prompt_countryISO 3166-1 alpha-2Country context for the run.
brand_mentionedbooleanTrue if the tracked brand appears anywhere in the answer.
brand_citedbooleanTrue if the brand's domain or owned URL appears in citations.
brand_positioninteger or nullRank order of the brand's first mention or citation; null if absent.
share_of_voicefloatBrand mentions divided by total brand mentions in this answer (0.0 to 1.0).
sentimentenumpositive, neutral, negative, or unknown.
citationsarray of CitationOrdered list of citations rendered with the answer.
competitor_mentionsarray of CompetitorMentionOne entry per tracked competitor that appears.
answer_word_countintegerTotal words in the rendered answer.
prompts_in_runintegerCount of prompts in the parent run; used as denominator for project-level share of voice.

Citation (nested)

FieldTypeDescription
positioninteger1-indexed citation order in the answer.
urlstring (absolute URL)Cited URL.
domainstringRegistrable domain (eTLD+1).
is_brand_ownedbooleanTrue if the domain matches a configured owned domain.
is_competitor_ownedbooleanTrue if the domain matches a tracked competitor.
source_authorityenumhigh, medium, low, or unknown.
snippetstringQuoted text the engine attributed to the citation, when available.

CompetitorMention (nested)

FieldTypeDescription
competitorstringCompetitor brand name.
mentionedbooleanTrue if mentioned in the answer text.
citedbooleanTrue if cited via a competitor-owned URL.
positioninteger or nullRank order of first appearance.

Optional fields

These appear in some vendor exports and are recommended when present: evidence_url (link to a stored screenshot or HTML capture), answer_html, answer_markdown, cost_usd (per-run API cost), latency_ms, tags, intent (commercial, informational, transactional), prompt_cluster_id, region, device, is_followup, parent_observation_id, notes, evidence_hash.

Serialization formats

JSON Lines (preferred)

The canonical format is JSON Lines (.jsonl): one VisibilityObservation per line, with a separate report-meta.json file in the same archive. JSON Lines streams cleanly into BigQuery, Snowflake, and DuckDB without parsing the whole file.

jsonl

{"observation_id":"7c1c...","run_id":"r-2026-04-29-001","run_at":"2026-04-29T03:00:00Z","engine":"perplexity","engine_model":"sonar-large","prompt_id":"p-042","prompt_text":"Best AI visibility tools for agencies","prompt_locale":"en-US","prompt_country":"US","brand_mentioned":true,"brand_cited":true,"brand_position":2,"share_of_voice":0.34,"sentiment":"positive","citations":[{"position":1,"url":"https://example.com/post","domain":"example.com","is_brand_owned":true,"is_competitor_owned":false,"source_authority":"medium","snippet":"..."}],"competitor_mentions":[{"competitor":"Acme","mentioned":true,"cited":false,"position":3}],"answer_word_count":312,"prompts_in_run":50}

CSV (compatibility)

For tools that cannot emit JSON Lines, a flattened CSV is allowed. Nested arrays are encoded as JSON strings inside a single column. UTF-8, RFC 4180 quoting, and a single header row are required. Date columns must be ISO 8601 with explicit Z offset.

Naming convention

Archive files use aivr____.zip. Inside, report-meta.json and observations.jsonl (or observations.csv) are required.

How to apply

Implementer checklist

  • Map each native column to the canonical field name; document drops and additions.
  • Cast all timestamps to UTC ISO 8601 before write.
  • Resolve domains to eTLD+1 using the Public Suffix List.
  • Generate observation_id as UUID v4 to keep rows idempotent on re-export.
  • Validate the output against the JSON Schema artifact before publishing.

Consumer recipes

  • Looker Studio: load JSON Lines into BigQuery, then point a community connector at the table; the Peec AI connector already follows similar field names and only needs a thin view.
  • Tableau / Power BI: ingest the CSV variant; the share-of-voice and brand-cited fields are already pre-aggregated per observation.
  • Notion / Airtable: import observations.csv and build rollups by engine and prompt_id. Use share_of_voice averaged over the window for executive dashboards.
  • Slack alerts: trigger when brand_cited flips from true to false on prompts where it was true the prior run.
  • schema.org Dataset describes a dataset at the catalog level; this spec describes the record shape inside that dataset.
  • DataCite Metadata Schema standardizes citation of research datasets, not row-level AI answer observations.
  • Vendor-native CSVs (Profound, Peec, Otterly) overlap heavily with the required fields but rename columns; this spec freezes the names so consumers do not have to.

Misconceptions

  • "My BI tool can just auto-detect columns." It can detect types, but it cannot reconcile semantic differences (mention vs citation, project share of voice vs answer share of voice) without a contract.
  • "Sentiment is a simple positive or negative." Vendors disagree on neutral versus unknown; this spec separates them so missing data does not bias dashboards.
  • "Share of voice is comparable across vendors out of the box." Only when the denominator is fixed. The prompts_in_run field exists to make that denominator explicit.

Versioning and conformance

This specification follows semantic versioning. Minor versions add optional fields; major versions change field semantics. A producer is strict-conformant if every required field is populated for every observation and loose-conformant if optional fields are absent but no required field is missing.

FAQ

Q: What is the AI Visibility Report Schema?

It is a vendor-neutral specification for AI visibility data exports. It defines a ReportMeta envelope and a VisibilityObservation record with 18 required fields covering prompt, engine, citation, position, sentiment, and share of voice so multiple vendor exports can be unioned into one dataset.

Q: Which vendors does it support?

Any vendor whose export contains the required fields. Profound, Peec AI, Otterly, HubSpot AEO, Brandlight, and Rankability already export every required field under different names; the spec normalizes the names rather than the underlying data.

Q: Should I use JSON Lines or CSV?

Use JSON Lines when your warehouse supports it (BigQuery, Snowflake, DuckDB, Databricks). It preserves nested citations without ambiguity. Use CSV only for low-code destinations like Notion, Airtable, or Looker Studio data sources that cannot parse nested JSON.

Q: How is share of voice computed?

share_of_voice on a single observation is the brand's mentions divided by the total brand mentions inside that one answer. Project-level share of voice is computed by the consumer as count(brand_cited == true) / prompts_in_run over the window, which is why prompts_in_run is required.

Q: Is this an official standard?

No. It is a community specification published by Geodocs to fill the gap left by vendors and avoid bespoke ETL for every tool change. It is versioned, reviewed every 90 days, and intended to evolve with vendor exports rather than ahead of them.

Related Articles

comparison

GEO vs AEO: Complete Comparison of AI Search Strategies

Compare GEO vs AEO: their goals, levers, content emphasis, and how they complement each other in AI search optimization strategy.

tutorial

Ahrefs for GEO: Content Gap Analysis and AI Visibility

Step-by-step Ahrefs for GEO tutorial: use Content Gap, Keywords Explorer, Brand Radar, AI Content Helper, and Site Audit to find AI search opportunities and ship cluster content.

checklist

AI Bot Log Analytics Tool Buyer's Checklist

Buyer's checklist for evaluating AI bot log analytics platforms that track GPTBot, ClaudeBot, and PerplexityBot crawl behavior across server logs.

Stay Updated

GEO & AI Search Insights

New articles, framework updates, and industry analysis. No spam, unsubscribe anytime.