Should I use the Schema Markup Validator or the Rich Results Test?

Use both. The Schema Markup Validator at validator.schema.org checks vocabulary against the full schema.org graph and is the right tool for AI search readiness beyond Google. The Rich Results Test at search.google.com/test/rich-results checks Google-specific eligibility and is the right tool for Google rich results. Per Google Search Central, the recommended order is Rich Results Test first, then Schema Markup Validator for generic coverage.

Does the Rich Results Test have an official API?

No. Google publishes the Rich Results Test as a public web tester. Pipelines either drive the web UI via headless browser, use a community CLI (such as the Apify-hosted Rich Results Tester), or evaluate JSON-LD against Google's published rich result requirements locally and treat the Rich Results Test as a manual sanity check.

How do I handle schema.org versions?

Pin the validator to a known schema.org release per environment, run a scheduled job that re-runs validation against the next release in a non-blocking lane, and treat any new errors as a heads-up before that release becomes the default.

What error rate is acceptable in production?

Zero errors is the right target for any template you ship. Warnings should sit at a steady-state count; alert when the count rises week-over-week. Treat any net-new warning category as a release blocker until triaged.

Should I validate AI-search-specific schema (e.g., DefinedTerm, ClaimReview) the same way?

Yes for vocabulary (Stage 2) and regression monitoring (Stage 4). Stage 3 will return informational findings for types Google does not promote; track those via the exemption process so they are not noise.

How does this pipeline differ from a generic JSON Schema CI check?

Generic JSON Schema checks structure but not semantics. JSON-LD validation also checks that @type values exist on schema.org, that properties belong on the declared type, and that values are in the right range. Pattern: use a JSON-LD-aware validator (such as json-ld-schema) for project-specific shape rules, layered on top of the public schema.org and Rich Results validators.

JSON-LD Validation Pipeline Specification for AI Search

A JSON-LD validation pipeline runs every published page's structured data through schema.org's vocabulary validator and Google's Rich Results Test on every commit, classifies each finding as an error or a warning, fails the build on errors, and alerts on regressions. Without it, malformed JSON-LD silently degrades AI search eligibility — and AI crawlers, unlike Google Search Console, do not surface the failures back to you.

TL;DR

Production JSON-LD validation has four required stages: (1) build-time syntax and JSON-LD context validation, (2) vocabulary validation against schema.org using the Schema Markup Validator, (3) Google-eligibility validation with the Rich Results Test, and (4) post-deploy regression monitoring with automated alerts. Errors fail the build; warnings are triaged weekly; an explicit exemption process tracks intentionally non-Google-compliant markup (for example, new schema.org types Google has not yet promoted to a rich result).

Definition

A JSON-LD validation pipeline is an automated, repeatable workflow that validates structured data on every code change and every deploy. It treats JSON-LD as code: every block of application/ld+json is parsed, type-checked against schema.org, evaluated against Google's rich result requirements, and monitored in production for regressions.

The pipeline composes three official validators (Google Search Central recommends starting with the Rich Results Test for Google features, then the Schema Markup Validator for generic schema.org coverage), one or more open-source linters, and a regression monitor — wrapped in a CI runner such as GitHub Actions.

Why a validation pipeline matters

Malformed JSON-LD has three failure modes that hurt AI search:

Silent invalidation. A typo in @type (Articcle instead of Article) drops the entire block from extraction. AI crawlers do not warn you.
Partial extraction. A missing required property (for example, Recipe.recipeIngredient) keeps the block valid for schema.org but ineligible for Google's Recipe rich result and inconsistent across AI engines.
Drift. Schema.org evolves. A property valid in version 26.0 may be deprecated in 27.0; without a versioned validator, drift accumulates silently.

AI engines (ChatGPT, Perplexity, Gemini, Google AI Overviews, Claude) treat structured data as one signal in their grounding stack — schema-only optimization is ignored, but schema combined with strong visible content measurably improves citation eligibility. Catching errors at CI time keeps the signal clean.

How the pipeline works

The pipeline runs in four sequential stages on every pull request and on every production deploy.

flowchart LR
  A["Commit / PR"] --> B["Stage 1: Syntax & Context"]
  B --> C["Stage 2: Vocabulary (Schema Markup Validator)"]
  C --> D["Stage 3: Google Eligibility (Rich Results Test)"]
  D --> E{"Errors?"}
  E -- yes --> F["Fail build"]
  E -- no --> G["Warnings?"]
  G -- yes --> H["Triage queue"]
  G -- no --> I["Deploy"]
  I --> J["Stage 4: Production regression monitor"]
  J --> K["Alert on delta"]

Stage 1 — Syntax and JSON-LD context

Parse every