AI Visibility Report Schema Specification: Standardized Citation Data Export Format Across Vendors
This specification defines a vendor-neutral schema for AI visibility reports so agencies can normalize exports from Profound, Peec AI, Otterly, HubSpot AEO, and similar tools into a single dataset. It covers an 18-field required record, recommended JSON Lines and CSV serializations, and ingestion guidance for Looker Studio, BigQuery, Snowflake, Notion, and Airtable.
TL;DR
Every AI visibility platform exports the same five core dimensions — prompt, engine, citation, position, share of voice — but each vendor uses a different column layout. This spec freezes a canonical column set (18 required + 14 optional fields) and a JSON Lines payload so analysts can union exports, join to revenue, and build cross-vendor dashboards without rewriting glue code each time a vendor renames a column. Use it as the target schema in your ETL layer.
Why a vendor-neutral schema is needed
Agencies and in-house RevOps teams now subscribe to two or three AI visibility tools at once. Profound publishes raw exports plus an API, Peec AI ships CSVs and a Looker Studio community connector, Otterly emails CSV reports, and HubSpot AEO sits inside the Marketing Hub with its own data model. Practitioners on r/b2bmarketing call out the duct-tape problem directly: CSVs land in different shapes, columns drift between releases, and there is no standard interchange format an analyst can target.
Without a shared schema:
- Cross-vendor share of voice cannot be reconciled — each tool defines "mention" differently.
- Joining citations to GA4 referral or CRM revenue requires per-vendor SQL transforms.
- BI dashboards (Looker Studio, Tableau, Power BI) break every time a vendor adds or renames a column.
- Notion or Airtable rollups stay manual because import schemas have to be redefined per file.
This specification is the missing target schema. It is intentionally narrow: only fields that at least two of {Profound, Peec, Otterly, HubSpot AEO, Brandlight, Rankability} already export are required.
How it works
Each visibility tool runs a fixed prompt against an answer engine on a schedule, captures the rendered answer, and parses the citations and brand mentions out of it. The resulting row describes one prompt-engine-run combination. This spec models that row as a single JSON object (or CSV row) called a VisibilityObservation. A complete export is an array of VisibilityObservation records plus a small ReportMeta envelope.
Implementers transform their native export to this schema in the ETL layer, then load into a warehouse table or BI source.
Key concepts
VisibilityObservation
The atomic record. One observation per (prompt, engine, run_at) tuple. Citations and mentions inside the answer expand into nested arrays rather than separate rows so a single answer rendering remains traceable.
ReportMeta
A single-object envelope that wraps the observation array and records vendor, schema version, project identifier, and time window. Required so consumers can reject incompatible exports.
Share of voice
Defined here as the percentage of monitored prompts in which the brand appears as a citation or named mention, scoped to a single engine. Vendors compute this at the project or competitor level; this spec standardizes the denominator (prompts_in_run) so aggregations agree.
Schema definition
ReportMeta (required envelope)
| Field | Type | Required | Description |
|---|---|---|---|
| schema_version | string (semver) | Yes | Must be "1.0" for conformance with this spec. |
| vendor | string | Yes | Source platform (e.g. "profound", "peec", "otterly", "hubspot-aeo"). |
| project_id | string | Yes | Stable project or workspace identifier from the vendor. |
| brand | string | Yes | Primary tracked brand name. |
| competitors | array of string | No | Tracked competitor brands. |
| window_start | ISO 8601 datetime | Yes | Inclusive start of the reporting window in UTC. |
| window_end | ISO 8601 datetime | Yes | Exclusive end of the reporting window in UTC. |
| generated_at | ISO 8601 datetime | Yes | When the export was produced. |
| currency | ISO 4217 code | No | Used only when monetary fields are populated. |
VisibilityObservation (required fields)
| Field | Type | Description |
|---|---|---|
| observation_id | string (UUID v4) | Globally unique row identifier. |
| run_id | string | Stable identifier for the batch run that produced this observation. |
| run_at | ISO 8601 datetime | When the prompt was executed against the engine, in UTC. |
| engine | enum | One of chatgpt, perplexity, gemini, google-ai-overviews, google-ai-mode, copilot, claude, meta-ai, grok, deepseek. |
| engine_model | string | Model variant when known (e.g. gpt-5, sonar-large, gemini-2.5-pro). |
| prompt_id | string | Stable identifier for the monitored prompt. |
| prompt_text | string | Verbatim prompt text. |
| prompt_locale | BCP 47 tag | Locale used when issuing the prompt (e.g. en-US, de-DE). |
| prompt_country | ISO 3166-1 alpha-2 | Country context for the run. |
| brand_mentioned | boolean | True if the tracked brand appears anywhere in the answer. |
| brand_cited | boolean | True if the brand's domain or owned URL appears in citations. |
| brand_position | integer or null | Rank order of the brand's first mention or citation; null if absent. |
| share_of_voice | float | Brand mentions divided by total brand mentions in this answer (0.0 to 1.0). |
| sentiment | enum | positive, neutral, negative, or unknown. |
| citations | array of Citation | Ordered list of citations rendered with the answer. |
| competitor_mentions | array of CompetitorMention | One entry per tracked competitor that appears. |
| answer_word_count | integer | Total words in the rendered answer. |
| prompts_in_run | integer | Count of prompts in the parent run; used as denominator for project-level share of voice. |
Citation (nested)
| Field | Type | Description |
|---|---|---|
| position | integer | 1-indexed citation order in the answer. |
| url | string (absolute URL) | Cited URL. |
| domain | string | Registrable domain (eTLD+1). |
| is_brand_owned | boolean | True if the domain matches a configured owned domain. |
| is_competitor_owned | boolean | True if the domain matches a tracked competitor. |
| source_authority | enum | high, medium, low, or unknown. |
| snippet | string | Quoted text the engine attributed to the citation, when available. |
CompetitorMention (nested)
| Field | Type | Description |
|---|---|---|
| competitor | string | Competitor brand name. |
| mentioned | boolean | True if mentioned in the answer text. |
| cited | boolean | True if cited via a competitor-owned URL. |
| position | integer or null | Rank order of first appearance. |
Optional fields
These appear in some vendor exports and are recommended when present: evidence_url (link to a stored screenshot or HTML capture), answer_html, answer_markdown, cost_usd (per-run API cost), latency_ms, tags, intent (commercial, informational, transactional), prompt_cluster_id, region, device, is_followup, parent_observation_id, notes, evidence_hash.
Serialization formats
JSON Lines (preferred)
The canonical format is JSON Lines (.jsonl): one VisibilityObservation per line, with a separate report-meta.json file in the same archive. JSON Lines streams cleanly into BigQuery, Snowflake, and DuckDB without parsing the whole file.
jsonl
{"observation_id":"7c1c...","run_id":"r-2026-04-29-001","run_at":"2026-04-29T03:00:00Z","engine":"perplexity","engine_model":"sonar-large","prompt_id":"p-042","prompt_text":"Best AI visibility tools for agencies","prompt_locale":"en-US","prompt_country":"US","brand_mentioned":true,"brand_cited":true,"brand_position":2,"share_of_voice":0.34,"sentiment":"positive","citations":[{"position":1,"url":"https://example.com/post","domain":"example.com","is_brand_owned":true,"is_competitor_owned":false,"source_authority":"medium","snippet":"..."}],"competitor_mentions":[{"competitor":"Acme","mentioned":true,"cited":false,"position":3}],"answer_word_count":312,"prompts_in_run":50}
CSV (compatibility)
For tools that cannot emit JSON Lines, a flattened CSV is allowed. Nested arrays are encoded as JSON strings inside a single column. UTF-8, RFC 4180 quoting, and a single header row are required. Date columns must be ISO 8601 with explicit Z offset.
Naming convention
Archive files use aivr_
How to apply
Implementer checklist
- Map each native column to the canonical field name; document drops and additions.
- Cast all timestamps to UTC ISO 8601 before write.
- Resolve domains to eTLD+1 using the Public Suffix List.
- Generate observation_id as UUID v4 to keep rows idempotent on re-export.
- Validate the output against the JSON Schema artifact before publishing.
Consumer recipes
- Looker Studio: load JSON Lines into BigQuery, then point a community connector at the table; the Peec AI connector already follows similar field names and only needs a thin view.
- Tableau / Power BI: ingest the CSV variant; the share-of-voice and brand-cited fields are already pre-aggregated per observation.
- Notion / Airtable: import observations.csv and build rollups by engine and prompt_id. Use share_of_voice averaged over the window for executive dashboards.
- Slack alerts: trigger when brand_cited flips from true to false on prompts where it was true the prior run.
vs related interchange formats
- schema.org Dataset describes a dataset at the catalog level; this spec describes the record shape inside that dataset.
- DataCite Metadata Schema standardizes citation of research datasets, not row-level AI answer observations.
- Vendor-native CSVs (Profound, Peec, Otterly) overlap heavily with the required fields but rename columns; this spec freezes the names so consumers do not have to.
Misconceptions
- "My BI tool can just auto-detect columns." It can detect types, but it cannot reconcile semantic differences (mention vs citation, project share of voice vs answer share of voice) without a contract.
- "Sentiment is a simple positive or negative." Vendors disagree on neutral versus unknown; this spec separates them so missing data does not bias dashboards.
- "Share of voice is comparable across vendors out of the box." Only when the denominator is fixed. The prompts_in_run field exists to make that denominator explicit.
Versioning and conformance
This specification follows semantic versioning. Minor versions add optional fields; major versions change field semantics. A producer is strict-conformant if every required field is populated for every observation and loose-conformant if optional fields are absent but no required field is missing.
FAQ
Q: What is the AI Visibility Report Schema?
It is a vendor-neutral specification for AI visibility data exports. It defines a ReportMeta envelope and a VisibilityObservation record with 18 required fields covering prompt, engine, citation, position, sentiment, and share of voice so multiple vendor exports can be unioned into one dataset.
Q: Which vendors does it support?
Any vendor whose export contains the required fields. Profound, Peec AI, Otterly, HubSpot AEO, Brandlight, and Rankability already export every required field under different names; the spec normalizes the names rather than the underlying data.
Q: Should I use JSON Lines or CSV?
Use JSON Lines when your warehouse supports it (BigQuery, Snowflake, DuckDB, Databricks). It preserves nested citations without ambiguity. Use CSV only for low-code destinations like Notion, Airtable, or Looker Studio data sources that cannot parse nested JSON.
Q: How is share of voice computed?
share_of_voice on a single observation is the brand's mentions divided by the total brand mentions inside that one answer. Project-level share of voice is computed by the consumer as count(brand_cited == true) / prompts_in_run over the window, which is why prompts_in_run is required.
Q: Is this an official standard?
No. It is a community specification published by Geodocs to fill the gap left by vendors and avoid bespoke ETL for every tool change. It is versioned, reviewed every 90 days, and intended to evolve with vendor exports rather than ahead of them.
Related Articles
GEO vs AEO: Complete Comparison of AI Search Strategies
Compare GEO vs AEO: their goals, levers, content emphasis, and how they complement each other in AI search optimization strategy.
Ahrefs for GEO: Content Gap Analysis and AI Visibility
Step-by-step Ahrefs for GEO tutorial: use Content Gap, Keywords Explorer, Brand Radar, AI Content Helper, and Site Audit to find AI search opportunities and ship cluster content.
AI Bot Log Analytics Tool Buyer's Checklist
Buyer's checklist for evaluating AI bot log analytics platforms that track GPTBot, ClaudeBot, and PerplexityBot crawl behavior across server logs.