AI Citation Risk Register Framework

An AI citation risk register applies ISO 31000 risk management to generative engine optimization. It catalogs eight named failure modes — decay, hallucination, misattribution, refusal, drift, dilution, source-substitution, opt-out leak — scores each on likelihood and impact, and assigns mitigation owners via a RACI matrix.

TL;DR

Most GEO programs treat AI citations as upside only. A Columbia Journalism Review study found generative search tools misattributed sources in a majority of tested cases, and Nature has documented tens of thousands of hallucinated references in scientific literature. Brands need risk-side governance, not just optimization. This framework adapts ISO 31000 to AI citations: enumerate failure modes, score likelihood×impact, document mitigations, and assign owners. Output: a living register reviewed every 90 days.

Why a risk register, not a checklist

Checklists assume one-off effort; risk registers assume drift. AI surfaces change ranking and citation behavior monthly: a model rev silently re-weights sources; a publisher rolls out an opt-out signal; a competitor seeds a Wikipedia entity; a benchmark refresh re-orders share of voice. A checklist passed last quarter is not evidence the program is safe today.

The risk register pattern, formalized in ISO 31000, gives GEO programs a governance instrument. Risk identification, analysis, evaluation, treatment, monitoring, and communication run as a continuous loop rather than a one-time audit. This is the same structure used in cybersecurity, financial compliance, and clinical risk programs — well-tested, well-tooled, and easy to staff.

The eight named failure modes

A usable register starts with a fixed taxonomy. The following eight modes cover the failure cases observed across AI search studies in 2024-2026.

1. Citation decay

A page that previously won citations stops being cited as the model is re-trained, the index refreshes, or competitors publish fresher content. Often invisible until a quarterly review. Anchor concept: LLM citation decay.

2. Hallucinated citation

The AI surface invents a URL, title, or author that does not exist. Documented at scale in Nature (Naddaf & Quill, 2026) and in legal-domain analyses by Stanford HAI. The brand is not the source; a fabricated source is.

3. Misattribution

The AI surface cites the wrong publisher for the brand's content. CJR's 2025 study found one tool misattributed sources in 115 out of 200 queries. The brand's content surfaces under a competitor's name.

4. Refusal

The AI surface refuses to answer in a topic where the brand publishes, often because the model treats the topic as policy-sensitive. The brand earns zero citations regardless of content quality.

5. Concept drift

The entity the AI surface associates with the brand drifts away from intended positioning. A B2B platform becomes "a marketing tool" or vice versa. Driven by aliases, low-quality third-party citations, or competitor entity seeding.

6. Citation dilution

The brand is cited but in a list of 8-12 sources where attention is spread thin. Click-through and share-of-voice impact approaches zero. Common when AI surfaces favor breadth.

7. Source substitution

The AI surface paraphrases the brand's claim but cites an aggregator, Wikipedia, or a syndicating publisher instead. Authorship is lost. Often caused by missing canonical metadata or weak entity disambiguation.

8. Opt-out leak

A team member updates robots.txt or the user-agent allowlist incorrectly, blocking AI crawlers and silently erasing citations. Conversely, an opt-out request from a partner is missed. Both are operational risks, not content risks.

Scoring: likelihood × impact

Following ISO 31000, each risk is scored on a 5×5 grid:

Likelihood (1-5): rare, unlikely, possible, likely, almost certain.
Impact (1-5): negligible, minor, moderate, major, severe — expressed in citation-share points lost or revenue exposure.

The product (1-25) is the inherent risk score. Apply mitigations to derive the residual risk score. Anything residual ≥ 12 escalates to executive review.

Use the same scale across the register. Inconsistent scales destroy comparability across content lines.

Register column template

A minimal register row contains ten columns:

Risk ID — stable identifier (e.g., RISK-CITE-003).
Failure mode — one of the eight names above.
Description — one-sentence specific to this asset or topic.
Likelihood (1-5).
Impact (1-5).
Inherent score — likelihood × impact.
Existing mitigations — controls already in place.
Residual score — after existing mitigations.
Owner — the accountable person from the RACI matrix.
Review date — next scheduled re-evaluation (≤ 90 days).

Keep the register in a queryable tool (Notion, Linear, Jira, Smartsheet) so review cadence is visible.

Mitigation playbooks

Each failure mode has a default mitigation pattern. Programs adapt them to context.

Decay: schedule re-validation of canonical pages every 60-90 days; refresh last_reviewed_at; reissue primary citations.
Hallucination: publish authoritative source pages with stable URLs and consistent author entity; submit corrections via publisher feedback channels where available.
Misattribution: add explicit author, Organization, and SameAs schema; ensure canonical URL is the brand's own; avoid syndication that strips attribution.
Refusal: rewrite content to be neutral and citation-bait friendly; avoid policy-sensitive framings unless the brand has authority.
Drift: maintain a canonical concept ID per page; monitor third-party citations; correct misleading aliases.
Dilution: invest in primary research; produce citation-bait artefacts other sources must reference.
Source substitution: reduce syndication; ensure original publication is canonical; add llms.txt entries.
Opt-out leak: version-control robots.txt and llms.txt; review on every infrastructure change; audit user-agent allowlists quarterly.

Link each row of the register to the playbook it depends on so the response is reproducible.

RACI matrix

Clear ownership prevents register decay. A baseline RACI for AI citation risk:

Activity	Content lead	SEO/GEO lead	Engineering	Legal/Brand
Identify and log new risks	R	A	C	I
Score likelihood and impact	C	R/A	I	C
Implement content mitigation	R/A	C	I	I
Implement technical mitigation	I	C	R/A	I
Approve opt-in / opt-out changes	C	C	R	A
Quarterly review	C	A	C	I
Executive escalation (residual ≥ 12)	C	R	C	A

R = Responsible, A = Accountable, C = Consulted, I = Informed. Customize per org structure but resist diluting accountability across multiple owners.

Review cadence and escalation

Weekly: scan citation tracker for regressions; add new risks if observed.
Monthly: review residual score ≥ 8 risks; verify mitigations are operating.
Quarterly: full register review; refresh likelihood and impact scores; retire closed risks.
On change: re-score affected rows when an AI surface ships a major update or the brand publishes a new tier-1 page.

Residual score ≥ 12 must be escalated to the GEO program owner and (where revenue impact exists) to executive leadership. Document escalation paths in the framework so they cannot be skipped under time pressure.

Common mistakes

Treating AI citations as upside-only. Without risk-side instrumentation, regressions are invisible.
One-time audit instead of register. Drift accumulates; a static checklist will not catch it.
No owner per risk. A risk owned by everyone is owned by nobody.
Scoring inconsistently. Different teams using different scales destroy register utility.
Skipping opt-out leak. Operational misconfigurations are the easiest preventable cause of citation collapse.

FAQ

Q: How is this different from a typical content audit?

Audits are point-in-time and tactical. A risk register is continuous and strategic. The audit answers "is this content correct now?"; the register answers "how likely is each failure mode to harm us in the next 90 days, and who is responsible?"

Q: Do small teams need this?

Yes — but in lighter form. A 10-row spreadsheet covering the eight failure modes plus one or two brand-specific risks is enough for most early-stage teams. The point is the cadence, not the tooling.

Q: Can the register be automated?

Likelihood and impact still require human judgment, but data feeds (citation tracker output, llms.txt drift detector, schema CI failures) can pre-populate rows. Treat automation as decision support, not replacement.

Q: How does this interact with the GEO maturity model?

A risk register is a Stage 3+ artifact in the GEO maturity model. Programs at Stage 1-2 should focus on baseline content and measurement before adopting the full register; an informal failure-mode checklist is enough until then.