Agent Error Handling Documentation Specification: Designing Errors Agents Can Self-Repair From

This specification defines the structure, fields, and severity levels documentation must provide so an LLM agent can read a tool error and immediately repair its next call. Each error must carry a stable code, the offending field, allowed_values or constraints, a one-line hint, and retryable plus severity flags so agents distinguish transient failures from terminal contract violations.

TL;DR

Agent-facing error docs are an interface, not a log message. To make an LLM agent self-repair, every documented error needs (1) a stable machine-readable code, (2) the input field that caused it, (3) the allowed_values or constraint that was violated, (4) a short hint written in the imperative for the agent, and (5) retryable and severity metadata. Errors that omit these fields force agents to retry blindly, hallucinate field names, or halt the workflow.

Why agent-facing errors are different

Classical error documentation targets human developers who can read a stack trace, open a dashboard, and decide what to do. Agents cannot do any of that. They observe a single tool response and must choose the next step from that string alone. If the error is generic ("400 Bad Request"), the agent's only recovery moves are to retry, hallucinate a fix, or escalate — all three are expensive and frequent failure modes in production agents.

Recent research on agentic workflow exceptions catalogues this gap. SHIELDA (2025) shows that LLM-driven agents fail along distinct exception classes — tool invocation, output shape, planning, environment — and that recovery quality depends almost entirely on whether the error response is structured enough to be parsed and acted upon. Industry practitioners report the same pattern: agents wrapping every tool call in validation only recover gracefully when the upstream error tells them which field to change and what value is acceptable.

This spec turns those findings into a documentation contract. It applies to:

REST and RPC APIs exposed to agents
MCP tools and OpenAI / Anthropic / Gemini function-call schemas
Internal tool wrappers in LangGraph, AI SDK, LlamaIndex, and similar runtimes

For background on how agents use tools, see the AI Agents hub and the agent tool-use documentation spec. For how errors interact with retry trajectories, see the agent trajectory documentation spec.

Definitions

Agent-readable error. An error response whose payload exposes enough structured fields that an LLM agent can deterministically choose the next action without re-asking the human user.

Self-repair. The agent's act of modifying the next tool call — changing a field value, picking a different tool, narrowing scope, waiting and retrying — based solely on the structured error.

Error contract. The documented union of all error codes a tool can emit, their fields, and their semantics. The contract is part of the tool's public interface, equal in weight to the input schema.

Required fields (the minimum viable contract)

Every agent-facing error response MUST include:

Field	Type	Purpose
code	string (kebab-case or SCREAMING_SNAKE)	Stable identifier the agent can branch on. Never reuse codes across versions.
message	string	Short, declarative human-readable sentence. No stack traces.
field	string	string[]	null	JSON pointer or dotted path of the offending input, if applicable.
allowed_values	array	object	null	Concrete enum, range, regex, or schema fragment the agent can copy from.
hint	string	Imperative one-liner: "Use ISO 8601 datetime", "Reduce limit to ≤100".
retryable	boolean	True only for transient failures (rate limit, timeout, 5xx).
severity	enum: info	warning	error	fatal	Drives the agent's escalation policy.
request_id	string	For human audit; agents pass it through to logs.

Optional but recommended fields

Field	Type	When to include
retry_after_ms	integer	Required when retryable: true and the cause is rate-limiting or backoff.
docs_url	string	Link to the canonical error doc; agents may follow it via tool.
related_codes	string[]	Sibling codes the agent should consider (e.g., upstream cause).
suggested_value	unknown	A concrete value the agent can use without further reasoning.
category	enum: validation	auth	rate_limit	state	dependency	internal	Helps the agent route to the correct repair strategy.
example_request	object	A minimal corrected request body.

Severity ladder

info — The call succeeded but produced a soft warning. The agent should record it and proceed.

warning — The call partially succeeded. The agent should examine partial results before retrying.

error — The call failed but is recoverable through self-repair. The agent must follow the hint or pick a different tool.

fatal — The call failed and no repair is possible from the agent's context (e.g., revoked credentials, deleted resource). The agent must escalate to a human or terminal node.

severity is independent of HTTP status. A 200 OK can carry severity: warning; a 429 is almost always severity: error with retryable: true.

Self-repair hint style guide

Hints are read by the LLM and translated into action. Write them like commit messages, not log lines.

Do:

"Use ISO 8601 datetime in UTC, e.g. 2026-04-29T00:00:00Z."
"Set page_size between 1 and 100."
"Provide either email or phone, not both."
"Wait 1500 ms before retrying; this endpoint is rate-limited at 60 rpm."

Don't:

"Invalid input." (no field, no fix)
"An unexpected error occurred." (zero signal)
"See documentation." (forces a tool call the agent may not have)
"Please try again later." (no retry_after_ms)

The hint must be self-sufficient. Assume the agent will not load docs_url. If repair requires multi-step reasoning, encode the steps as enumerated bullets inside hint or split into multiple errors with sequential codes.

Canonical error envelope

{
  "error": {
    "code": "INVALID_DATE_FORMAT",
    "message": "Field start_date must be ISO 8601.",
    "field": "start_date",
    "allowed_values": {
      "format": "date-time",
      "pattern": "^\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}Z$"
    },
    "hint": "Use ISO 8601 in UTC, e.g. 2026-04-29T00:00:00Z.",
    "retryable": false,
    "severity": "error",
    "category": "validation",
    "request_id": "req_01J...",
    "docs_url": "https://example.com/docs/errors/INVALID_DATE_FORMAT"
  }
}

The single error object is preferred over arrays for backwards compatibility with most LLM tool-call middleware. When multiple errors apply, return the most actionable one and list siblings under related_codes.

Example: validation error with suggested_value

{
  "error": {
    "code": "OUT_OF_RANGE",
    "message": "Field limit must be between 1 and 100.",
    "field": "limit",
    "allowed_values": { "minimum": 1, "maximum": 100 },
    "suggested_value": 100,
    "hint": "Reduce limit to 100 or less.",
    "retryable": false,
    "severity": "error",
    "category": "validation"
  }
}

Adding suggested_value collapses the agent's decision into a single deterministic edit, which measurably reduces retry loops in tool-calling benchmarks.

Example: transient error with retry_after_ms

{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Too many requests.",
    "field": null,
    "allowed_values": null,
    "hint": "Wait 1500 ms before retrying.",
    "retryable": true,
    "retry_after_ms": 1500,
    "severity": "error",
    "category": "rate_limit"
  }
}

Industry retry guides are converging on the rule that retryable: true without retry_after_ms is malformed — the agent will either retry too fast and worsen the rate-limit window or back off arbitrarily.

Example: fatal error agents must escalate

{
  "error": {
    "code": "RESOURCE_DELETED",
    "message": "Resource user_42 no longer exists.",
    "field": "user_id",
    "allowed_values": null,
    "hint": "Do not retry. Inform the user the resource is gone.",
    "retryable": false,
    "severity": "fatal",
    "category": "state"
  }
}

severity: fatal is the agent's stop signal. Tools that misuse it (e.g., labeling rate limits fatal) cause premature human escalation; tools that under-use it (e.g., labeling deleted-resource errors error) cause infinite repair loops.

Documentation requirements per error code

Each error code MUST be documented with:

Code identifier — exact string, in code formatting.
Severity and category — from the enums above.
Cause — one paragraph explaining when the tool emits this error.
Repair recipe — numbered steps the agent should take. Mirror the hint field.
Example payload — full canonical envelope.
Related codes — links to upstream and downstream errors.
Stability — stable, beta, or deprecated.

Errors marked deprecated MUST list the replacement code and a removal date.

Mapping to OpenAPI and MCP

For OpenAPI, expose error codes via components.responses and embed the envelope in the application/json schema. Use x-agent-error-codes: [...] extensions to surface the full enum at the operation level so generators can hand it to LLMs.

For MCP tools, include the error catalogue in the tool's description field as a fenced JSON block under the heading ## Errors. MCP clients pass tool descriptions verbatim to the model, which is the only way the agent will see the contract at call time.

For OpenAI, Anthropic, and Gemini function-calling, embed a compact errors array in the tool description; modern providers all forward unmodified description text to the model.

Anti-patterns

Generic 400/500. Status code without payload structure forces blind retries.
Free-text errors only. The LLM may parse them, but unreliably; agents trained against structured outputs lose accuracy on freeform fallbacks.
Mixing field and message. Putting the field name only inside the prose denies branch logic at runtime.
Reusing codes for multiple causes. Breaks downstream agent policies that whitelist or blacklist specific codes.
retry: true without budget. Tells the agent to loop without telling it when to stop.
Embedding HTML or stack traces in hint. Wastes tokens and confuses the model.

Common misconceptions

"Agents can read any error if the LLM is smart enough." Frontier models do parse natural-language errors, but accuracy collapses on long, multi-cause messages. Structured fields are deterministic.

"Validation belongs only on the client." Tools called by agents are the client. The server still must return errors the agent can act on.

"Logging is enough." Logs help the human operator, never the agent. Agent-facing errors and operator logs are two different artifacts.

Implementation checklist

[ ] Every error code listed in a single registry file under version control.
[ ] Registry compiled into OpenAPI / MCP / function-call schemas at build time.
[ ] CI test that asserts every emitted error matches the documented contract.
[ ] Sample agent harness that exercises each code and verifies self-repair.
[ ] Severity audit run quarterly to detect drift.
[ ] deprecated codes pruned every 6 months.

FAQ

Q: Do agents really need structured errors, or can a frontier LLM parse anything?

Frontier models can parse most prose, but real production traces show error parsing is the single largest source of recovery failures. Structured fields turn parsing into a deterministic branch and remove an entire failure class. Treat structured errors as a reliability investment, not a UX nicety.

Q: How is this different from RFC 7807 / Problem Details?

RFC 7807 is the right starting point. This spec extends it with field, allowed_values, hint, retryable, severity, and suggested_value — the fields agents need to self-repair. A compliant tool can emit Problem Details and this spec by adding the extra members.

Q: Should errors be localized?

message and hint should be in the language the agent operates in (typically English for tool-calling). Localization belongs to the user-facing layer, not the tool contract. Keep code, field, and allowed_values language-neutral.

Q: What about partial successes in batch endpoints?

Use severity: warning on the top-level response and an items[].error array per failed item. Each item error follows the same envelope. Agents then decide whether to retry only failed items or roll back.

Q: How do I version error codes safely?

Codes are part of the public contract. Add new codes freely; never repurpose existing ones. When meaning changes, introduce a new code (INVALID_DATE_FORMAT_V2), mark the old one deprecated, and run both for at least one major version.

: Reddit, r/AI_Agents — "The most underrated skill for building AI agents isn't prompting. It's error handling." Production retrospective on tool failure patterns. https://www.reddit.com/r/AI_Agents/comments/1q749k4/the_most_underrated_skill_for_building_ai_agents/

: ApX Machine Learning, "Error Handling Strategies in Tool Execution." Catalogue of error sources and recovery strategies for LLM agent tools. https://apxml.com/courses/building-advanced-llm-agent-tools/chapter-1-llm-agent-tooling-foundations/tool-error-handling

: SHIELDA: Structured Handling of Exceptions in LLM-Driven Agentic Workflows, arXiv:2508.07935 (2025). Taxonomy of agentic exceptions and recovery patterns. https://arxiv.org/html/2508.07935v1

: Substack, "From Exceptions to Explanations: Error Handling in the Age of LLMs" (Dec 2025). Layered model: typed errors for runtimes, structured explanations for agents. https://substack.com/home/post/p-181005987

: Towards AI, "Building Retries in Agents" (Apr 2026). Practitioner analysis of retry loops and structured recovery. https://pub.towardsai.net/building-retries-in-agents-how-to-build-ai-agents-that-survive-failures-32eedd2623f0

: Medium, "Error Handling & Retries: Making LLM Calls Reliable" (Feb 2026). Retry envelope and backoff guidance. https://medium.com/@sonitanishk2003/error-handling-retries-making-llm-calls-reliable-ee7722fc2ea9

: Agenta, "The guide to structured outputs and function calling with LLMs." Vendor comparison of OpenAI, Anthropic, and Gemini schema enforcement. https://agenta.ai/blog/the-guide-to-structured-outputs-and-function-calling-with-llms

: Substack, Jae Li, "Tool Calling Is Not a Solved Problem." On structured-output reliability vs. freeform fallback. https://substack.com/home/post/p-167344254