AI Citation Pre-Publish QA Checklist: 30 Verification Steps Before Releasing GEO Content

Use this 30-point checklist before publishing any GEO article to make sure structure, claims, schema, and freshness signals are in place. The list maps known AI citation failure modes — hallucinated quotes, ungrounded facts, broken schema, stale review dates — to concrete editor actions you can complete in under 30 minutes per page.

TL;DR. AI search engines misattribute or fabricate citations in roughly 60% of news queries tested by the Tow Center for Digital Journalism, so a pre-publish QA gate is now a load-bearing part of GEO operations. This checklist organizes 30 verifications into four blocks — Structure and extractability, Claim grounding, Schema and metadata, Freshness and lifecycle — and pairs each item with a single yes/no editor action.

How to use this checklist

Run all 30 items before flipping a page from Ready for Review to Approved. Treat any unchecked High item as a blocker; Medium items can be filed as follow-ups if release is time-sensitive. Capture evidence (URLs, validator screenshots, decisions) in the article's Research Notes and Agent Notes so the audit trail stays attached to the canonical concept.

Block A — Structure and extractability (1-8)

[ ] 1. Title length 50-70 characters and contains the focus keyword once.
[ ] 2. Meta description 120-160 characters, answer-first, includes the focus keyword.
[ ] 3. H1 matches frontmatter title exactly; no duplicate H1 elsewhere on the page.
[ ] 4. AI summary blockquote sits directly under H1 and answers the canonical question in ≤2 factual sentences.
[ ] 5. TL;DR paragraph of 2-3 snippet-ready sentences appears in the first viewport.
[ ] 6. Heading hierarchy is clean — H2 for sections, H3 for sub-claims, no skipped levels.
[ ] 7. Lists, tables, and definition pairs are used wherever the underlying content is enumerable; generative engines extract these patterns most reliably.
[ ] 8. FAQ block of 3-5 ### Q: questions appears at the bottom, each with a 2-4 sentence answer.

Block B — Claim grounding and source quality (9-16)

[ ] 9. Every strong claim has a source — statistics, comparisons, and "X is the leading Y" assertions.
[ ] 10. Sources are primary or top-cited. Prefer official docs (Google, OpenAI, Anthropic, schema.org), peer-reviewed papers, and industry leaders like Search Engine Land, Ahrefs, and Moz.
[ ] 11. Each source is dated and verified within the last 12 months (or older if it is a stable specification such as a W3C or schema.org doc).
[ ] 12. No hallucinated citations. Resolve every outbound URL; fabricated or broken references are the most common AI citation failure mode flagged in published research.
[ ] 13. Quotes are reproduced verbatim and attributed to the correct author and outlet.
[ ] 14. Numbers, dates, and version strings match the source exactly.
[ ] 15. No unsupported superlatives ("best", "most", "always") unless backed by cited data.
[ ] 16. Research Notes column lists every source as URL — verified YYYY-MM-DD — supports claim X.

Block C — Schema, metadata, and structured data (17-24)

[ ] 17. Frontmatter contains all ~30 fields required by the Geodocs taxonomy (IDENTITY, CANONICAL KNOWLEDGE, TAXONOMY, SEO, AI READABILITY, LIFECYCLE, RELATIONS, I18N, AUTHORSHIP).
[ ] 18. canonical_concept_id is unique and not shared by any other Approved row.
[ ] 19. canonical_url is exact: https://geodocs.dev/{section}/{slug} with no trailing slash.
[ ] 20. content_type, difficulty, and section match the database row.
[ ] 21. entities[], aliases[], and related_concepts[] are populated; these power the canonical knowledge layer and dedupe.
[ ] 22. citation_readiness: reviewed is set, and llm_summary is ≤2 factual sentences.
[ ] 23. JSON-LD Article schema (or HowTo / FAQPage where applicable) is rendered server-side and validates in the schema.org validator.
[ ] 24. Author and reviewer fields map to real people or the team handle (Geodocs Research Team); no anonymous bylines.

Block D — Freshness, internal linking, and lifecycle (25-30)

[ ] 25. published_at, updated_at, last_reviewed_at are all set to today for new articles, and updated_at is bumped on any substantive edit.
[ ] 26. review_cycle_days is set (default 90) so the page re-enters the audit queue automatically.
[ ] 27. ≥3 internal links to sibling articles, plus one link to the section hub.
[ ] 28. related_articles[] lists up to five valid sibling slugs that resolve to Approved rows.
[ ] 29. Outbound links are healthy — no 404s, no redirected vendor URLs, no archived versions where the live source exists.
[ ] 30. Word count is inside the content-type range (definition 600-1400, guide 1200-3500, tutorial 1500-4000, comparison 800-2000, framework 1000-2500, checklist 500-1500).

Severity mapping

Treat items 1, 4, 9, 10, 12, 17, 18, 22, 23, 25, 30 as High — failing any of them is a release blocker. The remaining items are Medium: log them as follow-up tasks, but do not gate release on them when timing is tight.

Why this gate matters

The Tow Center for Digital Journalism's 2025 study of eight AI search engines — ChatGPT Search, Perplexity, Perplexity Pro, Gemini, DeepSeek Search, Grok-2 Search, Grok-3 Search, and Copilot — found citation errors in over 60% of 1,600 test queries, with chatbots fabricating links and misattributing quotes even when the source article was crawlable. Peer-reviewed work in 2025 and 2026 has documented hallucinated citations slipping into NeurIPS- and ICLR-accepted papers, showing the same failure mode bleeding from chatbots into formal publishing.

A documented pre-publish QA gate is the cheapest counter-measure available: it forces every claim to be sourced and every schema field to be populated before content reaches the index, which is roughly an order of magnitude cheaper than recovering lost citations after the fact.

FAQ

Q: How long does this checklist take per article?

For a 1,500-word page authored against the Geodocs taxonomy, an experienced editor completes all 30 items in 20-30 minutes. The slowest steps are usually claim grounding (Block B) and JSON-LD schema validation (item 23).

Q: Which items are non-negotiable for AI citation?

Items 4 (AI summary block), 9-12 (claim grounding and no hallucinated citations), 17 and 22 (full frontmatter with citation_readiness: reviewed), and 23 (valid JSON-LD) carry the most weight. Generative engines rely on extractable answers, grounded facts, and machine-readable metadata in that order.

Q: How does this differ from a post-incident citation recovery process?

The AI Citation Recovery Playbook handles articles that lost AI citations after publication. This checklist is preventive — it stops thin or ungrounded pages from shipping in the first place, which is roughly 10× cheaper than recovery.

Q: How often should the checklist itself be reviewed?

Re-evaluate it every 90 days alongside the rest of the canonical taxonomy. AI engines update their extraction heuristics and schema preferences frequently, so individual items occasionally need to be rewritten or added.

Q: Where should QA evidence live?

Capture verifications in the article's Research Notes and Agent Notes fields in the Geodocs Articles database, and link any external validator reports (schema.org validator, Lighthouse, link-checker output) in the row's body. This keeps the audit trail attached to the canonical concept.