Agent Output Validation Documentation Specification

Agent output validation enforces that every response an AI agent emits conforms to a declared JSON Schema before it leaves the agent boundary, using runtime validators (Ajv, Zod, Pydantic) wired into a deterministic output contract.

TL;DR

AI agents emit free-form text by default. To make their outputs safe to consume by downstream tools, you must (1) declare a JSON Schema for every output channel, (2) validate at runtime with a fast validator like Ajv, Zod, or Pydantic, and (3) define explicit error and partial-output behavior. This specification standardises those layers so multiple agents and tools interoperate without ad-hoc parsers.

Why output validation matters

Agents that hand untyped strings to other systems are a source of silent failures. A single missing field, mistyped enum, or hallucinated key can break a downstream pipeline that assumed structured data. OpenAI's Structured Outputs feature exists precisely because raw JSON mode "does not guarantee that the model's response will conform to a particular schema", and the API now constrains generation against developer-supplied JSON Schema Schemas. Anthropic's tool use likewise treats the input_schema as the contract: "Each example must be valid according to the tool's input_schema. Invalid examples return a 400 error" Define tools.

Validation is the second layer: even with model-side schema enforcement, agents still hit edge cases — refusals, truncated streams, retries, schema drift, mid-flight schema migrations. Runtime validation converts those into typed errors instead of malformed JSON reaching production.

Specification scope

This spec covers four mandatory pieces:

Schema declaration — how output schemas are authored and stored.
Runtime validation hooks — where in the agent loop validation runs.
Error format — the canonical shape of a validation failure.
Partial-output handling — behaviour for streamed and truncated outputs.

Non-goals: tool input validation (covered by host tool spec), prompt-side schema authoring, model fine-tuning for structured output.

1. Schema declaration

Every output channel of an agent MUST have a named JSON Schema (Draft 2020-12 or JSON Type Definition / RFC 8927). Schemas are stored as first-class artefacts:

One file per schema: /schemas/.schema.json.
A version field ($id with semver path) for backwards-compatible evolution.
A canonical title and description used by the agent prompt and any documentation generator.

Schemas SHOULD prefer:

additionalProperties: false for closed objects, so unknown keys fail fast.
Explicit required arrays — never implicit through default values.
enum over string for finite vocabularies.
anyOf/oneOf for variant types; avoid oneOf for engines that only honour anyOf (the Vercel AI SDK has had to convert oneOf to anyOf for both Anthropic and OpenAI providers in production code paths).

2. Runtime validation hooks

Validation MUST run at three points in the agent lifecycle:

Pre-emit hook — after the model returns, before any tool, network, or storage call consumes the output.
Streaming hook (optional, recommended) — incremental validation against a relaxed schema during streaming, with full validation at completion.
Post-merge hook — after partial outputs are merged (e.g. across retries), the final object is re-validated.

Reference implementations:

JavaScript / TypeScript: Ajv compiles schemas to standalone validation code with v8-friendly performance and supports JSON Schema drafts 04, 06, 07, 2019-09, and 2020-12 plus JTD Ajv docs. Use compile once at boot and reuse the validator function — never recompile per request.
Python: pydantic v2 with model_validate_json() provides equivalent throughput; for raw JSON Schema use jsonschema with a pre-built Draft202012Validator.
Rust / Go: jsonschema-rs and gojsonschema respectively.

Compiled validators MUST be cached by $id to avoid recompiling on every request.

3. Error format

A validation failure MUST be returned as a structured error, not a thrown exception that bubbles to the user. The canonical error shape:

{
  "error": "output_validation_failed",
  "schema_id": "https://geodocs.dev/schemas/answer.v1.json",
  "agent_id": "answer-agent",
  "violations": [
    {
      "path": "$.citations[0].url",
      "keyword": "format",
      "expected": "uri",
      "received": "not-a-url",
      "message": "must match format 'uri'"
    }
  ],
  "raw_output": "...truncated...",
  "retryable": true
}

Fields are required. retryable is true when the agent SHOULD self-correct via a follow-up turn, and false for unrecoverable schema mismatches that require human review.

4. Partial-output handling

Streamed outputs are the hardest case. The spec defines three states:

In-flight — schema validation is deferred; the chunk buffer is appended.
Soft-complete — model signalled stop; run full validation. On failure, retry once with the violation appended to the prompt.
Hard-complete — second attempt failed validation; emit the canonical error and surface to the caller. Do not silently truncate.

For refusals (model returned a refusal field instead of the requested schema), the agent MUST treat the refusal as a first-class output, not a validation failure.

Examples per validator library

Library	Language	Schema flavour	Best for
Ajv	JS/TS	JSON Schema 2020-12, JTD	Edge runtimes, browser bundles
Zod	TS	TS-native (compiled to JSON Schema)	Type-safe agent SDKs
Pydantic v2	Python	TS-style models	LangChain / LlamaIndex stacks
jsonschema	Python	JSON Schema 2020-12	Pure spec compliance
jsonschema-rs	Rust	JSON Schema 2020-12	High-throughput services

Common mistakes

Trusting model-side enforcement alone. Structured Outputs reduces but does not eliminate the need for runtime validation; refusals, JSON-mode fallbacks, and provider quirks still surface invalid payloads.
Compiling schemas per request. Validator compilation is expensive — cache by $id.
Throwing on validation failure. Always return a typed error so retry logic can branch.
Open schemas everywhere. Without additionalProperties: false, hallucinated keys leak into downstream systems undetected.
Validating only the final merged object. Partial outputs that pass individually can fail when merged; add a post-merge hook.

FAQ

Q: Do I still need runtime validation if I use OpenAI Structured Outputs or Anthropic tool use?

Yes. Provider-side enforcement reduces invalid output rates dramatically but does not cover refusals, mid-stream errors, schema drift between agent and consumer, or providers that do not yet support strict mode. Runtime validation is the contract boundary.

Q: JSON Schema vs JSON Type Definition — which should I pick?

Use JSON Schema 2020-12 when you need rich validation (formats, conditional schemas, regex). Use JTD (RFC 8927) when you want a small, deterministic subset that compiles to faster validators and translates cleanly to typed code in multiple languages.

Q: How do I validate streamed outputs?

Validate at soft-complete, not on each chunk. For long outputs, define a relaxed schema for incremental sanity checks (e.g. "object has started"), then run the full schema once the model emits stop.

Q: What about tool inputs the agent generates?

The same spec applies — the tool's input_schema is the contract. Reuse the same compiled validator on the agent side before dispatching the tool call.