Geodocs.dev

Article Schema Markup Checklist for AI Search Engines

ShareLinkedIn

Open this article in your favorite AI assistant for deeper analysis, summaries, or follow-up questions.

Article schema (Article, NewsArticle, BlogPosting, TechArticle) gives AI crawlers a machine-readable description of who wrote a page, when, about what, and how it relates to other entities. This 30-field checklist covers the required, recommended, and AI-specific fields that ChatGPT, Perplexity, Google AI Overviews, and Copilot consume when deciding whether to cite a page u2014 plus the validation steps to ship them safely.

TL;DR

  • Implement schema as JSON-LD in the , not Microdata or RDFa.
  • Always include the 8 required core fields: @type, headline, datePublished, dateModified, author, publisher, image, mainEntityOfPage.
  • Add the 8 strongly recommended fields (description, inLanguage, articleSection, keywords, wordCount, about, mentions, isPartOf) for AI entity grounding.
  • Add the 6 AI-specific fields (citation, sameAs on author, identifier, creativeWorkStatus, accessibilityFeature, isAccessibleForFree) to maximize citation candidacy.
  • Always validate with Google's Rich Results Test and keep schema content aligned with visible HTML.

Choose the right Article subtype

  • Article u2014 default. Use for evergreen guides, references, comparisons, frameworks.
  • NewsArticle u2014 timely news with a publication date that drives the value of the page. Eligible for Google News rich results.
  • BlogPosting u2014 blog content. Treated similarly to Article by Google but signals editorial blog context.
  • TechArticle u2014 technical documentation with a step or reference focus. Useful for tutorials and specs.

Google treats Article, NewsArticle, and BlogPosting similarly for rich-result eligibility, so pick the subtype that most accurately describes the content and stay consistent across the site.

Required core fields (8)

  • [ ] @context = "https://schema.org". Always present.
  • [ ] @type u2014 one of Article, NewsArticle, BlogPosting, TechArticle.
  • [ ] headline u2014 page title. u2264110 characters per Google guidance; longer titles may be ignored for rich results. Should match the visible H1.
  • [ ] datePublished u2014 ISO-8601 datetime of original publication. Immutable after publication.
  • [ ] dateModified u2014 ISO-8601 datetime of the last substantive edit. Update every time the body actually changes.
  • [ ] author u2014 nested Person (preferred) or Organization. Required for E-E-A-T credit and for Perplexity's authorship signal.
  • [ ] publisher u2014 Organization with nested name and logo (ImageObject). Required for Google rich results and for AI brand entity grounding.
  • [ ] image u2014 array of one or more high-resolution image URLs (u22651200px wide). Required for Discover and AIO previews.

These fields are technically optional but materially improve AI citation rates because they tie the page into the broader knowledge graph.

  • [ ] mainEntityOfPage u2014 the canonical URL of this page. Helps engines disambiguate when the same content is syndicated.
  • [ ] description u2014 120u2013160 character meta-grade summary. Reused as snippet candidate.
  • [ ] inLanguage u2014 BCP-47 language code (e.g. "en", "en-US"). Critical for multilingual GEO.
  • [ ] articleSection u2014 hub or topic section (e.g. "AI Search Optimization").
  • [ ] keywords u2014 short array of focus and secondary keywords. Use sparingly; do not stuff.
  • [ ] wordCount u2014 numeric word count of articleBody. Helps AI engines decide chunk granularity.
  • [ ] about u2014 array of Thing references (or sameAs URLs) for the primary entities the article is about.
  • [ ] mentions u2014 array of Thing references for secondary entities the article references but is not principally about.

AI-specific fields worth adding (6)

  • [ ] citation u2014 array of CreativeWork (or URLs) the article cites. Demonstrates source-grounded research.
  • [ ] sameAs (on author) u2014 array of authoritative profile URLs (LinkedIn, GitHub, ORCID, organization staff page). Anchors author entity in the knowledge graph.
  • [ ] identifier u2014 stable internal ID (PropertyValue with propertyID: "canonical_concept_id"). Helps AI engines de-duplicate across syndication.
  • [ ] creativeWorkStatus u2014 one of "Draft", "Published", "Updated", "Archived". Lets AI crawlers respect lifecycle.
  • [ ] accessibilityFeature u2014 array describing accessibility (e.g. "alternativeText", "longDescription"). Reinforces quality signal.
  • [ ] isAccessibleForFree u2014 boolean. AI engines deprioritize gated content where possible.

Strongly encouraged structural fields (4)

  • [ ] isPartOf u2014 reference the parent Blog, WebSite, or Series to give engines structural context.
  • [ ] articleBody u2014 only when the page is partly behind script-rendered content; otherwise rely on the actual HTML.
  • [ ] speakable (NewsArticle only) u2014 SpeakableSpecification with the CSS selectors of the speakable summary.
  • [ ] backstory (NewsArticle only) u2014 short text describing methodology and sources for journalistic content.

Common AI-specific mistakes to avoid (4)

  • [ ] Schema-content mismatch. AI crawlers (per a widely-cited DUCKYEA test) often read JSON-LD as page text. If the schema and visible body disagree, the AI may pick the wrong fact u2014 or stop trusting the page entirely.
  • [ ] Stale dateModified. Updating only the schema date without changing the body produces no citation lift and is detectable.
  • [ ] Generic author. A bare Organization author with no Person or sameAs URLs underperforms a named, verifiable Person author.
  • [ ] Multiple Article blocks. Use exactly one Article-typed JSON-LD per page. Use a sibling WebPage or Organization block if you need additional types.

Validation checklist before shipping

  • [ ] Validate with Google's Rich Results Test.
  • [ ] Validate with the Schema Markup Validator.
  • [ ] Confirm headline matches the visible H1 character-for-character.
  • [ ] Confirm datePublished and dateModified match the content freshness signals artifacts (HTTP Last-Modified, visible
  • [ ] Confirm author.sameAs resolves to authoritative profiles.
  • [ ] Confirm mainEntityOfPage matches the canonical URL declared in the page's .
  • [ ] Re-test the page in Perplexity within 2u20134 weeks to confirm citation lift.

Minimal valid example

{
  "@context": "https://schema.org",
  "@type": "Article",
  "mainEntityOfPage": "https://geodocs.dev/technical/schema-article-markup-checklist-ai-search",
  "headline": "Article Schema Markup Checklist for AI Search Engines",
  "description": "Article schema markup checklist for AI search: 30 fields LLM crawlers consume to surface citations on ChatGPT, Perplexity, and AI Overviews.",
  "inLanguage": "en",
  "datePublished": "2026-04-29T08:00:00Z",
  "dateModified": "2026-04-29T08:00:00Z",
  "wordCount": 1100,
  "articleSection": "Technical GEO",
  "keywords": ["Article schema", "JSON-LD", "AI search"],
  "author": {
    "@type": "Organization",
    "name": "Geodocs Research Team",
    "sameAs": ["https://geodocs.dev/about"]
  },
  "publisher": {
    "@type": "Organization",
    "name": "Geodocs",
    "logo": {
      "@type": "ImageObject",
      "url": "https://geodocs.dev/logo.png"
    }
  },
  "image": ["https://geodocs.dev/og/schema-article-checklist.png"],
  "about": [
    { "@type": "Thing", "name": "Schema.org Article", "sameAs": "https://schema.org/Article" }
  ],
  "isAccessibleForFree": true
}

FAQ

Q: Does schema markup guarantee AI citations?

No. Schema is necessary but not sufficient. SearchEngineLand's analysis is explicit that schema does not 3x citations on its own. It does, however, reduce ambiguity, link your content to the knowledge graph, and reinforce E-E-A-T signals that AI engines combine with freshness, authority, and structural extractability when choosing citations.

Q: Should I use Article, NewsArticle, or BlogPosting?

Google treats all three similarly for rich results. Pick the subtype that most honestly describes the content (NewsArticle for journalism, BlogPosting for blog editorial, Article for evergreen) and stay consistent. Switching subtypes after the fact resets some ranking signals.

Q: Do AI engines actually parse JSON-LD or just read it as text?

Both. Practitioner tests have shown engines like ChatGPT and Perplexity occasionally extract facts from JSON-LD as if it were prose, which means invalid or contradictory JSON-LD can hurt you. Always validate, and never put information in JSON-LD that disagrees with the visible page.

Q: How many image URLs should I include?

At least one image at u22651200px wide. Three images at different aspect ratios (1:1, 4:3, 16:9) maximize eligibility across rich result formats and AIO previews.

Q: Do I need both about and mentions?

Use about for the 1u20133 primary entities the page is principally about. Use mentions for secondary entities you reference but do not deeply cover. Both fields anchor the page in the knowledge graph and improve entity-grounded citations.

Related Articles

framework

AI Citation Share Dashboard Framework: Tracking Share of Voice Across AI Engines

AI citation share dashboard framework: track share-of-voice across ChatGPT, Perplexity, Gemini, and Copilot with metrics aligned to GEO goals.

reference

AI Search Canonical URL Handling Specification

How AI search engines (ChatGPT, Perplexity, Gemini, Google AI Overviews) resolve rel=canonical, hreflang, and parameterized URLs when selecting and citing sources.

guide

Vector Embedding Optimization for AI Search Citations

Vector embedding optimization for AI citations: how chunking, density, and semantic clarity influence retrieval in RAG-powered LLM search engines.

Stay Updated

GEO & AI Search Insights

New articles, framework updates, and industry analysis. No spam, unsubscribe anytime.