AI Visibility Report Schema Specification: Standardized Citation Data Export Format Across Vendors

This specification defines a vendor-neutral schema for AI visibility reports so agencies can normalize exports from Profound, Peec AI, Otterly, HubSpot AEO, and similar tools into a single dataset. It covers an 18-field required record, recommended JSON Lines and CSV serializations, and ingestion guidance for Looker Studio, BigQuery, Snowflake, Notion, and Airtable.

TL;DR

Every AI visibility platform exports the same five core dimensions — prompt, engine, citation, position, share of voice — but each vendor uses a different column layout. This spec freezes a canonical column set (18 required + 14 optional fields) and a JSON Lines payload so analysts can union exports, join to revenue, and build cross-vendor dashboards without rewriting glue code each time a vendor renames a column. Use it as the target schema in your ETL layer.

Why a vendor-neutral schema is needed

Agencies and in-house RevOps teams now subscribe to two or three AI visibility tools at once. Profound publishes raw exports plus an API, Peec AI ships CSVs and a Looker Studio community connector, Otterly emails CSV reports, and HubSpot AEO sits inside the Marketing Hub with its own data model. Practitioners on r/b2bmarketing call out the duct-tape problem directly: CSVs land in different shapes, columns drift between releases, and there is no standard interchange format an analyst can target.

Without a shared schema:

Cross-vendor share of voice cannot be reconciled — each tool defines "mention" differently.
Joining citations to GA4 referral or CRM revenue requires per-vendor SQL transforms.
BI dashboards (Looker Studio, Tableau, Power BI) break every time a vendor adds or renames a column.
Notion or Airtable rollups stay manual because import schemas have to be redefined per file.

This specification is the missing target schema. It is intentionally narrow: only fields that at least two of {Profound, Peec, Otterly, HubSpot AEO, Brandlight, Rankability} already export are required.

How it works

Each visibility tool runs a fixed prompt against an answer engine on a schedule, captures the rendered answer, and parses the citations and brand mentions out of it. The resulting row describes one prompt-engine-run combination. This spec models that row as a single JSON object (or CSV row) called a VisibilityObservation. A complete export is an array of VisibilityObservation records plus a small ReportMeta envelope.

Implementers transform their native export to this schema in the ETL layer, then load into a warehouse table or BI source.

Key concepts

VisibilityObservation

The atomic record. One observation per (prompt, engine, run_at) tuple. Citations and mentions inside the answer expand into nested arrays rather than separate rows so a single answer rendering remains traceable.

ReportMeta

A single-object envelope that wraps the observation array and records vendor, schema version, project identifier, and time window. Required so consumers can reject incompatible exports.

Defined here as the percentage of monitored prompts in which the brand appears as a citation or named mention, scoped to a single engine. Vendors compute this at the project or competitor level; this spec standardizes the denominator (prompts_in_run) so aggregations agree.

Schema definition

ReportMeta (required envelope)

Field	Type	Required	Description
schema_version	string (semver)	Yes	Must be "1.0" for conformance with this spec.
vendor	string	Yes	Source platform (e.g. "profound", "peec", "otterly", "hubspot-aeo").
project_id	string	Yes	Stable project or workspace identifier from the vendor.
brand	string	Yes	Primary tracked brand name.
competitors	array of string	No	Tracked competitor brands.
window_start	ISO 8601 datetime	Yes	Inclusive start of the reporting window in UTC.
window_end	ISO 8601 datetime	Yes	Exclusive end of the reporting window in UTC.
generated_at	ISO 8601 datetime	Yes	When the export was produced.
currency	ISO 4217 code	No	Used only when monetary fields are populated.

VisibilityObservation (required fields)

Field	Type	Description
observation_id	string (UUID v4)	Globally unique row identifier.
run_id	string	Stable identifier for the batch run that produced this observation.
run_at	ISO 8601 datetime	When the prompt was executed against the engine, in UTC.
engine	enum	One of chatgpt, perplexity, gemini, google-ai-overviews, google-ai-mode, copilot, claude, meta-ai, grok, deepseek.
engine_model	string	Model variant when known (e.g. gpt-5, sonar-large, gemini-2.5-pro).
prompt_id	string	Stable identifier for the monitored prompt.
prompt_text	string	Verbatim prompt text.
prompt_locale	BCP 47 tag	Locale used when issuing the prompt (e.g. en-US, de-DE).
prompt_country	ISO 3166-1 alpha-2	Country context for the run.
brand_mentioned	boolean	True if the tracked brand appears anywhere in the answer.
brand_cited	boolean	True if the brand's domain or owned URL appears in citations.
brand_position	integer or null	Rank order of the brand's first mention or citation; null if absent.
share_of_voice	float	Brand mentions divided by total brand mentions in this answer (0.0 to 1.0).
sentiment	enum	positive, neutral, negative, or unknown.
citations	array of Citation	Ordered list of citations rendered with the answer.
competitor_mentions	array of CompetitorMention	One entry per tracked competitor that appears.
answer_word_count	integer	Total words in the rendered answer.
prompts_in_run	integer	Count of prompts in the parent run; used as denominator for project-level share of voice.

Citation (nested)

Field	Type	Description
position	integer	1-indexed citation order in the answer.
url	string (absolute URL)	Cited URL.
domain	string	Registrable domain (eTLD+1).
is_brand_owned	boolean	True if the domain matches a configured owned domain.
is_competitor_owned	boolean	True if the domain matches a tracked competitor.
source_authority	enum	high, medium, low, or unknown.
snippet	string	Quoted text the engine attributed to the citation, when available.

CompetitorMention (nested)

Field	Type	Description
competitor	string	Competitor brand name.
mentioned	boolean	True if mentioned in the answer text.
cited	boolean	True if cited via a competitor-owned URL.
position	integer or null	Rank order of first appearance.

Optional fields

These appear in some vendor exports and are recommended when present: evidence_url (link to a stored screenshot or HTML capture), answer_html, answer_markdown, cost_usd (per-run API cost), latency_ms, tags, intent (commercial, informational, transactional), prompt_cluster_id, region, device, is_followup, parent_observation_id, notes, evidence_hash.

Serialization formats

JSON Lines (preferred)

The canonical format is JSON Lines (.jsonl): one VisibilityObservation per line, with a separate report-meta.json file in the same archive. JSON Lines streams cleanly into BigQuery, Snowflake, and DuckDB without parsing the whole file.

jsonl

{"observation_id":"7c1c...","run_id":"r-2026-04-29-001","run_at":"2026-04-29T03:00:00Z","engine":"perplexity","engine_model":"sonar-large","prompt_id":"p-042","prompt_text":"Best AI visibility tools for agencies","prompt_locale":"en-US","prompt_country":"US","brand_mentioned":true,"brand_cited":true,"brand_position":2,"share_of_voice":0.34,"sentiment":"positive","citations":[{"position":1,"url":"https://example.com/post","domain":"example.com","is_brand_owned":true,"is_competitor_owned":false,"source_authority":"medium","snippet":"..."}],"competitor_mentions":[{"competitor":"Acme","mentioned":true,"cited":false,"position":3}],"answer_word_count":312,"prompts_in_run":50}

CSV (compatibility)

For tools that cannot emit JSON Lines, a flattened CSV is allowed. Nested arrays are encoded as JSON strings inside a single column. UTF-8, RFC 4180 quoting, and a single header row are required. Date columns must be ISO 8601 with explicit Z offset.

Naming convention

Archive files use aivr____.zip. Inside, report-meta.json and observations.jsonl (or observations.csv) are required.

How to apply

Implementer checklist

Map each native column to the canonical field name; document drops and additions.
Cast all timestamps to UTC ISO 8601 before write.
Resolve domains to eTLD+1 using the Public Suffix List.
Generate observation_id as UUID v4 to keep rows idempotent on re-export.
Validate the output against the JSON Schema artifact before publishing.

Consumer recipes

Looker Studio: load JSON Lines into BigQuery, then point a community connector at the table; the Peec AI connector already follows similar field names and only needs a thin view.
Tableau / Power BI: ingest the CSV variant; the share-of-voice and brand-cited fields are already pre-aggregated per observation.
Notion / Airtable: import observations.csv and build rollups by engine and prompt_id. Use share_of_voice averaged over the window for executive dashboards.
Slack alerts: trigger when brand_cited flips from true to false on prompts where it was true the prior run.

schema.org Dataset describes a dataset at the catalog level; this spec describes the record shape inside that dataset.
DataCite Metadata Schema standardizes citation of research datasets, not row-level AI answer observations.
Vendor-native CSVs (Profound, Peec, Otterly) overlap heavily with the required fields but rename columns; this spec freezes the names so consumers do not have to.

Misconceptions

"My BI tool can just auto-detect columns." It can detect types, but it cannot reconcile semantic differences (mention vs citation, project share of voice vs answer share of voice) without a contract.
"Sentiment is a simple positive or negative." Vendors disagree on neutral versus unknown; this spec separates them so missing data does not bias dashboards.
"Share of voice is comparable across vendors out of the box." Only when the denominator is fixed. The prompts_in_run field exists to make that denominator explicit.

Versioning and conformance

This specification follows semantic versioning. Minor versions add optional fields; major versions change field semantics. A producer is strict-conformant if every required field is populated for every observation and loose-conformant if optional fields are absent but no required field is missing.

FAQ

Q: What is the AI Visibility Report Schema?

It is a vendor-neutral specification for AI visibility data exports. It defines a ReportMeta envelope and a VisibilityObservation record with 18 required fields covering prompt, engine, citation, position, sentiment, and share of voice so multiple vendor exports can be unioned into one dataset.

Q: Which vendors does it support?

Any vendor whose export contains the required fields. Profound, Peec AI, Otterly, HubSpot AEO, Brandlight, and Rankability already export every required field under different names; the spec normalizes the names rather than the underlying data.

Q: Should I use JSON Lines or CSV?

Use JSON Lines when your warehouse supports it (BigQuery, Snowflake, DuckDB, Databricks). It preserves nested citations without ambiguity. Use CSV only for low-code destinations like Notion, Airtable, or Looker Studio data sources that cannot parse nested JSON.

share_of_voice on a single observation is the brand's mentions divided by the total brand mentions inside that one answer. Project-level share of voice is computed by the consumer as count(brand_cited == true) / prompts_in_run over the window, which is why prompts_in_run is required.

Q: Is this an official standard?

No. It is a community specification published by Geodocs to fill the gap left by vendors and avoid bespoke ETL for every tool change. It is versioned, reviewed every 90 days, and intended to evolve with vendor exports rather than ahead of them.

AI Visibility Report Schema Specification: Standardized Citation Data Export Format Across Vendors

TL;DR

Why a vendor-neutral schema is needed

How it works

Key concepts

VisibilityObservation

ReportMeta

Schema definition

ReportMeta (required envelope)

VisibilityObservation (required fields)

Citation (nested)

CompetitorMention (nested)

Optional fields

Serialization formats

JSON Lines (preferred)

CSV (compatibility)

Naming convention

How to apply

Implementer checklist

Consumer recipes

Misconceptions

Versioning and conformance

FAQ

Q: What is the AI Visibility Report Schema?

Q: Which vendors does it support?

Q: Should I use JSON Lines or CSV?

Q: Is this an official standard?

Related Articles

GEO vs AEO: Complete Comparison of AI Search Strategies

Ahrefs for GEO: Content Gap Analysis and AI Visibility

AI Bot Log Analytics Tool Buyer's Checklist

Thông tin GEO & AI Search

AI Visibility Report Schema Specification: Standardized Citation Data Export Format Across Vendors

TL;DR

Why a vendor-neutral schema is needed

How it works

Key concepts

VisibilityObservation

ReportMeta

Share of voice

Schema definition

ReportMeta (required envelope)

VisibilityObservation (required fields)

Citation (nested)

CompetitorMention (nested)

Optional fields

Serialization formats

JSON Lines (preferred)

CSV (compatibility)

Naming convention

How to apply

Implementer checklist

Consumer recipes

vs related interchange formats

Misconceptions

Versioning and conformance

FAQ

Q: What is the AI Visibility Report Schema?

Q: Which vendors does it support?

Q: Should I use JSON Lines or CSV?

Q: How is share of voice computed?

Q: Is this an official standard?

Related Articles

GEO vs AEO: Complete Comparison of AI Search Strategies

Ahrefs for GEO: Content Gap Analysis and AI Visibility

AI Bot Log Analytics Tool Buyer's Checklist

Thông tin GEO & AI Search