Agent Tool Side-Effect Disclosure Specification

Side-effect disclosure annotates each agent tool with read/write semantics, irreversibility, and risk class, allowing the agent runtime to branch between auto-execute, confirmation prompt, and block — giving operators a centralized policy surface to enforce safety across MCP, OpenAI, and Anthropic tool ecosystems.

TL;DR

A standard taxonomy of side-effect annotations: readOnlyHint, destructiveHint, idempotentHint, openWorldHint, and requiresConfirmation.
A direct mapping to MCP tool annotations alongside conventions in OpenAI function calling and Anthropic tool use.
A runtime decision flow that branches on annotations: auto-execute, confirm, or block.
A manifest example plus comparison of how MCP, OpenAI, and Anthropic surface the same disclosure intent in their respective schemas.

Definition

An agent tool side-effect disclosure attaches structured metadata to each callable agent tool, declaring whether the call reads or writes state, whether the operation is reversible, and the class of risk it introduces. Modern agent runtimes — including those built on Anthropic's tool-use API, OpenAI function calling, and the Model Context Protocol (MCP) — share a small disclosure vocabulary: readOnlyHint, destructiveHint, idempotentHint, openWorldHint, and requiresConfirmation. Tools without annotations default to the most permissive interpretation, which is unsafe in production. The disclosure is consumed by the runtime's policy engine before each call: read-only tools auto-execute, write or destructive tools route through a confirmation prompt, and tools with conflicting or missing annotations are blocked pending operator review. The spec also distinguishes between the intent of a tool (destructiveHint: true) and the requirement for human approval (requiresConfirmation: true); the two are correlated but not identical, so production runtimes should treat them as independent signals and document the difference in their operator playbooks.

Why this matters

Agent runtimes that auto-execute every tool call are one prompt-injection away from data loss, financial harm, or irreversible state changes. The disclosure pattern centralizes safety enforcement at the runtime layer so each tool author no longer has to re-implement confirmation UX or audit logging. It also gives operators a single policy surface to tighten or relax: a regulated workspace can require confirmation on every destructiveHint: true call, while a trusted internal automation can auto-execute the same call. Without disclosure, runtimes either over-block (annoying users with prompts on read-only calls) or under-block (silently shipping destructive operations).

Disclosure also unblocks compliance — auditors can list every agent tool and verify each is correctly classified — and enables safer cross-vendor tool sharing, because an MCP server published by one team can be consumed by another team's agent without re-interpreting the safety model from scratch. As MCP servers proliferate across vendors and as the same agent connects to dozens of tool providers in a single session, a shared disclosure schema is the only way to keep the runtime's policy logic tractable.

How it works

The disclosure flow has three phases: declaration, evaluation, and execution.

Declaration. Tool authors mark each callable surface with structured annotations. In MCP, this lives in the tools/list response under the annotations field per tool (MCP spec). OpenAI function calling exposes a similar surface through function-calling metadata, and Anthropic tool use accepts an input_schema plus implicit semantics communicated in the tool description. The disclosure vocabulary aligns across vendors:

readOnlyHint: the tool does not modify state.
destructiveHint: the tool may delete or overwrite data.
idempotentHint: repeated calls with the same arguments produce the same result.
openWorldHint: the tool interacts with external systems beyond the agent runtime's control.
requiresConfirmation: an explicit user-approval gate must be rendered before execution.

Evaluation. When the agent emits a tool call, the runtime's policy engine reads the disclosure, looks up the operator policy, and decides one of three outcomes: auto-execute, confirm, or block. Read-only or idempotent tools typically auto-execute. Destructive or open-world tools route through a confirmation prompt rendered via the agent UI. Tools without annotations are treated as destructiveHint: true for safety — this fail-closed default is critical and is recommended in the MCP server-author guidance.

Execution. After the policy engine clears the call, the runtime dispatches the tool, captures the response, and writes a structured audit entry containing the tool name, disclosure flags, policy decision, user identity (if confirmed), and the call result. The audit log doubles as a compliance artifact and as a feedback loop for refining policy.

flowchart LR
  Agent[Agent] --> Policy[Policy Engine]
  Policy -->|read-only| Auto[Auto-execute]
  Policy -->|destructive or irreversible| Confirm[Confirmation Prompt]
  Policy -->|missing annotation| Block[Block + Operator Alert]
  Confirm -->|user approves| Execute[Execute Tool]
  Confirm -->|user denies| Cancel[Cancel + Audit Log]

Practical application

A minimal MCP manifest fragment looks like this:

{
  "tools": [
    {
      "name": "read_calendar",
      "description": "Read events from the user's calendar.",
      "annotations": {
        "readOnlyHint": true,
        "idempotentHint": true,
        "openWorldHint": false
      }
    },
    {
      "name": "delete_calendar_event",
      "description": "Delete a calendar event by id.",
      "annotations": {
        "destructiveHint": true,
        "idempotentHint": false,
        "requiresConfirmation": true
      }
    }
  ]
}

For OpenAI function calling, the same intent is conveyed via the function description plus an out-of-band policy table the agent runtime consults; OpenAI's guide recommends that high-risk operations include explicit confirmation language in the function description so the model is less likely to call them without rationale. Anthropic tool use similarly suggests framing destructive operations with explicit confirmation semantics in the tool description.

Implementing the policy engine itself is straightforward: a small dispatcher mapping (tool_name, annotations) → policy decision is sufficient for most teams. The harder work is governance — assigning each tool's classification accurately, reviewing classifications on every tool change, and keeping the policy table in sync with the operator's risk tolerance. Most production teams treat the manifest as a versioned artifact in the same repo as the agent prompt template, so policy changes ship through normal code review.

Common mistakes

Leaving destructive tools unannotated. Without destructiveHint: true, the policy engine has no signal to gate the call. The fail-closed default helps, but explicit annotation is always safer and clearer to reviewers.
Conflating idempotent with read-only. An idempotent write (e.g. set_status) still mutates state on first call; treating it as read-only bypasses confirmation gates.
Missing the irreversibility flag. delete_ and drop_ tools deserve their own class beyond destructiveHint — operators may want to require multi-party approval on irreversible operations.
No fallback policy when annotations are absent. A runtime that errors when annotations are missing will degrade UX; a runtime that silently auto-executes is unsafe. The standard answer is fail-closed: treat missing annotations as destructiveHint: true.

FAQ

Q: How is destructiveHint different from requiresConfirmation?

destructiveHint declares the nature of the tool — it modifies state in a way that cannot be trivially undone. requiresConfirmation declares the runtime requirement — a confirmation prompt must be rendered before execution. The two are correlated but independent: a read-only tool that costs money to call (an external API with usage fees) may set requiresConfirmation: true without setting destructiveHint, and a destructive tool inside a fully audited automation may set destructiveHint: true without requiring a per-call confirmation. The MCP spec treats them as orthogonal flags.

Q: How should I declare side effects when they are conditional on input?

Always declare the worst-case classification. If an update_record tool can either patch or replace based on input, mark it destructiveHint: true even though the patch path is non-destructive. The policy engine cannot inspect call arguments at evaluation time without significant complexity, so worst-case declaration is the safe default. Document the conditional behavior in the tool description so users and reviewers understand the actual semantics.

Q: Where should side-effect enforcement live — server-side or client-side?

Both. The agent runtime (client) must enforce declared annotations to block or prompt before the call leaves the agent boundary. The tool implementation (server) must also enforce its own invariants — never trust a client-side flag alone, because a malicious or buggy agent could omit the annotation. Anthropic tool use and the MCP spec both recommend defense in depth: enforce at the runtime, validate at the tool, audit at both layers.

Agent Tool Side-Effect Disclosure Specification

TL;DR

Definition

Why this matters

How it works

Practical application

Common mistakes

FAQ

Q: How is destructiveHint different from requiresConfirmation?

Q: How should I declare side effects when they are conditional on input?

Q: Where should side-effect enforcement live — server-side or client-side?

Related Articles

Agent Permission Model Specification: RBAC, Scopes, and Tool-Level Auth

Agent Prompt Template Versioning Specification

Agent Tool Naming Conventions Specification for LLM Routing Reliability

GEO & AI Search Insights