Agent Idempotency Documentation Specification
Agent idempotency documentation must declare which tools are read-only, naturally idempotent, key-based idempotent, or non-idempotent. For key-based tools, docs must specify the Idempotency-Key header field, TTL, replay response shape (200 with cached body, 201 only on first write, 409 on key reuse with different payload), and compensation paths for non-idempotent operations.
TL;DR
Autonomous agents retry. Without explicit idempotency contracts, those retries duplicate orders, double-charge cards, and create ghost records. This spec defines what every agent-facing tool MUST document: a four-class idempotency taxonomy, an Idempotency-Key field with TTL and scope, the exact response status and body shape for replays, anti-patterns to flag, and machine-checkable assertions a manifest validator can run before publish. Pair with the Tool Use Documentation Spec and the Error Handling Spec. For the broader hub see /ai-agents/.
1. Definition
Agent idempotency documentation is a structured contract attached to a tool, function, or MCP server method that tells an autonomous agent — and any planner, retry middleware, or orchestrator above it — whether re-executing the same call is safe, and if not, what mechanism (key, conditional header, compensation) the agent must use to make it safe.
The contract has five required parts:
- Idempotency class drawn from a fixed taxonomy.
- Key field schema when the class requires a client-supplied key.
- Key TTL and scope so the agent knows how long replays are honored and where the key uniqueness applies.
- Replay response shape specifying the exact status code and body for first-write versus replay versus key-conflict.
- Compensation path when the operation cannot be made idempotent.
A tool whose docs omit any of these parts is undocumented for autonomous use. Reviewers MUST reject the manifest.
2. Why it matters
Autonomous agents differ from human-driven clients in three ways that make idempotency documentation non-optional:
- Retries are automatic. Frontier model orchestrators (Claude tool use, OpenAI function calling, Gemini function calling) retry on network errors, timeouts, and partial JSON. If the underlying tool is not retry-safe and the docs do not say so, the agent will re-execute it.
- Plans are persisted. Agent frameworks like the Anthropic Agent SDK and the Model Context Protocol let plans resume across processes. A resumed plan that reissues a create order call has no way to know the original succeeded unless the tool's response carries idempotent semantics.
- Failures are partial. Network or process failure between request and response is the common case at scale. The agent receives no answer; the server may have committed. Without an idempotency contract, the agent cannot recover safely.
Industry references converge on the same answer: declare safety in machine-readable form, support a client-supplied key for unsafe writes, and expose the replay response so clients can detect duplicates. Stripe's idempotency model is the most-copied prior art and is explicitly designed for retry-driven clients. RFC 9110 §9.2.2 codifies the HTTP-level distinction between idempotent and non-idempotent methods that all agent tooling inherits.
3. How it works
The contract operates at three layers.
3.1 The four-class idempotency taxonomy
Every tool MUST declare exactly one class:
| Class | Definition | Retry guidance for the agent |
|---|---|---|
| read_only | No server state change. Safe to retry indefinitely. | Retry freely. |
| naturally_idempotent | State change is, by construction, the same on every call (e.g. PUT /resources/{id} with full body, or cancel order if not cancelled). | Retry freely; compare response. |
| key_idempotent | State change is non-idempotent unless a client-supplied key is provided. Server deduplicates by key within TTL. | Retry only with the same key. |
| non_idempotent | State change cannot be made idempotent. A retry creates a duplicate. | Do not retry; use the documented compensation path. |
The class is the single most important field. An agent's planner uses it to decide whether to retry on timeout at all.
3.2 The idempotency-key field schema
For key_idempotent tools, docs MUST include a field block:
idempotency:
class: key_idempotent
key:
name: "Idempotency-Key"
location: "header" # header | query | body
type: "string"
format: "uuid-v4 | opaque"
max_length: 64
required: true
ttl_seconds: 86400 # 24h is the conventional default
scope: "account" # account | user | tenant | global
conflict_on_payload_change: 409scope is the most-overlooked field. Per-account scope means two different accounts may legally reuse the same key string; global scope means the key namespace is shared. Stripe's idempotency docs use account-scoped keys with a 24-hour TTL — that pair is the safest default unless the operation is privileged.
3.3 Replay response shape
Docs MUST specify the exact response shape for four cases. The contract is tightest here because agents key their retry decisions on status:
| Case | HTTP status | Body | Replay-detection header |
|---|---|---|---|
| First write succeeds | 201 Created (or domain-appropriate 2xx) | Created resource | Idempotency-Replay: false |
| Replay with same key + same payload | 200 OK | Cached response from first write, byte-identical | Idempotency-Replay: true |
| Replay with same key + different payload | 409 Conflict | Error body explaining payload mismatch | Idempotency-Conflict: payload-mismatch |
| Replay before first write completes | 409 Conflict (or 425 Too Early) | Error body indicating in-flight | Idempotency-Conflict: in-flight |
The Idempotency-Replay header lets an agent's downstream logic distinguish I just made this happen from this had already happened. Without it, the agent must compare bodies, which is brittle.
4. Key concepts
4.1 TTL guidance
- 24 hours — safe default for transactional writes (orders, payments, support tickets).
- 7 days — long-running agent workflows that may resume after restart.
- 5 minutes — high-throughput, low-latency operations where the deduplication window is intrinsically narrow (rate-limited counters, ephemeral session creation).
Document the TTL precisely. Never use "indefinite"; storage cost forces eventual eviction, and an agent that assumes infinite TTL will eventually duplicate.
4.2 Scope
The scope answers within what tenant boundary is this key unique? The default for B2B APIs is account. The default for multi-user consumer APIs is user. The default for global infrastructure operations is tenant. Document the choice and the reason; agents that orchestrate across tenants need the answer to deduplicate correctly.
4.3 Compensation path for non_idempotent tools
Some operations cannot be retried safely. A send_email that has already left an outbound queue, a dispatch_drone that is in the air, a transfer_funds that has settled. For these, docs MUST provide a compensation triplet:
- A reversal endpoint (e.g. cancel_send, recall_drone, reverse_transfer).
- A documented detection method for did the original succeed? — a status query, a webhook, an idempotent get.
- A maximum reversal window.
Without all three, the tool is unsafe for autonomous use and SHOULD be flagged in the manifest as agent_safe: false.
5. OpenAPI extension example
Tools published as OpenAPI specs MUST embed idempotency under a vendor extension. The recommended shape:
paths:
/v1/orders:
post:
operationId: createOrder
x-agent-idempotency:
class: key_idempotent
key_field: "Idempotency-Key"
key_location: header
ttl_seconds: 86400
scope: account
replay_header: "Idempotency-Replay"
conflict_status: 409
parameters:
- in: header
name: Idempotency-Key
schema: { type: string, maxLength: 64 }
required: true
responses:
"201": { description: "First write" }
"200": { description: "Replay; body identical to original 201" }
"409": { description: "Key reused with different payload" }The x-agent-idempotency block is the machine-readable contract. Agent toolchains (LangGraph, the Anthropic Agent SDK, custom MCP servers) parse this block to decide retry behavior automatically.
6. Worked example: agent retry loop with partial failure
sequenceDiagram
participant Agent
participant Tool as Order API
Agent->>Tool: POST /v1/orders (Idempotency-Key K1, payload P)
Note right of Tool: Server commits order O1
Tool--xAgent: Network failure (no response)
Note left of Agent: Timeout — retry with same K1, same P
Agent->>Tool: POST /v1/orders (Idempotency-Key K1, payload P)
Tool->>Agent: 200 OK + Idempotency-Replay true + body of O1
Note left of Agent: Plan continues with O1; no duplicate createdThe loop only works because every party honored the contract: the agent kept the same key on retry, the server cached the response, and the response declared itself a replay.
If the agent had retried with a new key, the server would have created O2 — a duplicate order — because deduplication is keyed, not content-based.
7. Anti-patterns to flag in review
| Anti-pattern | Why it breaks autonomy |
|---|---|
| "Retries safe within a few seconds" | Vague TTL; agents that resume across processes will exceed it silently. |
| Idempotency by request-body hash | Different payload formatting (whitespace, key order) yields different hashes; agents using JSON serializers cannot rely on it. |
| 200 returned for both first write and replay with no replay marker | Agents cannot distinguish; downstream logic re-fires side effects. |
| Key required only on writes flagged "important" | Agents cannot tell which writes are important; either key all or document naturally_idempotent. |
| Implicit per-IP scope | Agent fleets share egress IPs; collisions become inevitable. |
| TTL longer than the storage window | Server starts returning 201 Created on stale keys, silently duplicating. |
| Non-idempotent tool with no compensation endpoint | Tool is fundamentally unsafe for autonomous use. |
A manifest validator MUST refuse to publish a tool whose declared class is key_idempotent but whose docs omit any of: key_field, ttl_seconds, scope, replay status, conflict status.
8. Idempotency vs adjacent concepts
| Concept | What it covers | What it does not |
|---|---|---|
| Idempotency | Same call → same observable outcome (this spec) | Concurrent calls; ordering |
| Atomicity | Operation either fully commits or not at all | Replay safety on retry |
| Exactly-once delivery | Message bus property between producer and consumer | Application-level write semantics |
| Optimistic concurrency | Conflict detection on concurrent edits via version tokens | Deduplication of identical retries |
| Rate limiting | Throttling request volume | Whether retried writes duplicate |
Agents need all of these documented separately. The Rate Limit Documentation Checklist covers throttling; this spec covers idempotency only.
9. Common misconceptions
- "GET is always safe so I do not need to document it." True for state, but agents still need to know the response is cacheable, that there is no rate-limit penalty for retries, and whether the response is deterministic.
- "My DB has unique constraints; that's idempotency." Unique constraints surface conflicts as errors. Idempotency requires returning the original successful response on replay, not an error.
- "The HTTP method tells the agent enough." RFC 9110 defines PUT and DELETE as idempotent at the protocol layer, but application-layer side effects (emails, webhooks, billing) often are not. The class declaration is the source of truth.
- "Clients should generate keys however they want." Clients should, but the docs MUST specify accepted formats, max length, and whether the server will reject malformed keys.
10. How to apply this spec
- Inventory every tool the agent can call. Tag each with one class from the taxonomy.
- For every key_idempotent tool, fill the field block (key name, location, TTL, scope, replay markers).
- For every non_idempotent tool, write the compensation triplet (reversal endpoint, detection method, reversal window). If you cannot, mark agent_safe: false.
- Embed the x-agent-idempotency extension in OpenAPI specs and the equivalent block in MCP server manifests and Anthropic Agent Skills.
- Run a machine-checkable assertion suite before publish:
assertions:
- every_write_endpoint_has_class: true
- key_idempotent_endpoints_specify_ttl: true
- key_idempotent_endpoints_specify_scope: true
- replay_response_status_documented: true
- replay_response_body_byte_identical_to_first_write: true
- non_idempotent_endpoints_have_compensation: true
- manifest_validates_against_x_agent_idempotency_schema: true- Surface the contract in the human-readable docs page and in the manifest. Agents read both; reviewers read mostly the page; both must agree.
- Test the replay path in CI. A test harness that issues the same call twice and asserts the replay header is the cheapest insurance against silent duplicates.
11. Related references
- Agent Tool Use Documentation Spec — the parent doc for tool manifests.
- Agent Error Handling Documentation Spec — error-shape contracts that idempotency depends on.
- Agent Rate Limit Documentation Checklist — throttle contracts; complements retry safety.
- Agent Skill Manifest Specification — where the idempotency block lives in Anthropic Skills.
Primary sources for this spec:
- Stripe API Reference, Idempotent requests — https://docs.stripe.com/api/idempotent_requests
- RFC 9110, HTTP Semantics §9.2.2 Idempotent methods — https://www.rfc-editor.org/rfc/rfc9110.html#name-idempotent-methods
- OpenAI Platform, Function calling — https://platform.openai.com/docs/guides/function-calling
- Anthropic, Tool use with Claude — https://docs.anthropic.com/en/docs/build-with-claude/tool-use
- Model Context Protocol specification — https://modelcontextprotocol.io
FAQ
Q: Does idempotency belong in the tool manifest or only in human-readable docs?
A: Both. Frontier orchestrators read the manifest to make automatic retry decisions; reviewers read the docs page. The manifest is the source of truth for machines; the docs page must mirror it for humans. Drift between the two is the most common failure mode and is what the machine-checkable assertions in this spec are designed to catch.
Q: What TTL should I default to if I cannot pick one?
A: 24 hours is the safest default for transactional writes and matches the most-copied prior art (Stripe). Increase to 7 days only if your agent workflows persist plans across long-running processes; reduce to 5 minutes only if the operation is high-throughput and the deduplication window is intrinsically narrow.
Q: How does this differ from "exactly-once semantics" in message buses?
A: Exactly-once delivery is a transport property between a producer and a consumer of messages. Idempotency in this spec is an application-layer property of a tool's write endpoint. They are complementary: a message bus can deliver a tool invocation exactly once at the transport layer, but if the tool itself is not idempotent, downstream replay (process restart, plan resume) can still duplicate. The application-layer contract is the one agents must rely on.
Q: Should GET endpoints declare idempotency at all?
A: Yes — declare read_only. The class is the agent's signal that retries are unconditionally safe. Omitting it forces the agent's planner to assume the worst case and either retry conservatively or not at all, which leaves availability on the table.
Q: What is the minimum viable change for a team that has no idempotency docs today?
A: Add the four-class taxonomy field to every endpoint manifest. That alone unlocks safe retry decisions for read_only and naturally_idempotent tools and explicitly flags the rest as needing follow-up. Then add the key field block for the highest-risk write endpoints (payments, orders, communications) before tackling the long tail.
Q: How do I test that my replay response is byte-identical to the original?
A: Snapshot the first response body at write time, store it under the idempotency key alongside the response status and headers, and serve it verbatim on replay. In CI, fire the same request twice with the same key and assert byte equality on the response body and the Idempotency-Replay: true header on the second response.
Related Articles
Agent Error Handling Documentation Specification: Designing Errors Agents Can Self-Repair From
Spec for documenting error states, validation messages, and self-repair hints so AI agents recover automatically when calling your tools and APIs.
Agent Rate Limit Documentation Checklist: Disclosing Quotas, Retries, and Burst Limits to AI Agents
Agent rate limit documentation checklist: disclose quotas, retry-after headers, hierarchical limits, and burst guardrails so AI agents back off cleanly.
Agent Skill Manifest Specification: Publishing SKILL.md for AI Agent Discovery
Agent Skill Manifest specification: how to author and publish SKILL.md so Claude, ChatGPT, Codex, Gemini, and Copilot agents discover and reuse your docs.