Agent Handoff Protocol Documentation Spec for Multi-Agent AI Systems

An agent handoff protocol documentation spec is a contract-first description of how one agent transfers control to another, covering trigger conditions, the context payload, idempotency guarantees, and failure-recovery paths. Use this spec to write handoff documentation that both human reviewers and autonomous agents can consume without ambiguity.

TL;DR

Multi-agent AI systems fail at handoffs more often than they fail at reasoning. This specification defines a framework-agnostic, machine-readable format for documenting every handoff in your system as an explicit contract — six required fields (trigger, source, target, context payload, acceptance criteria, recovery) plus optional fields for observability and security. Adopt it once, and every handoff becomes greppable, testable, and safe to refactor across LangGraph, OpenAI Agents SDK, Semantic Kernel, AutoGen, or any custom orchestrator.

1. Scope and terminology

1.1 Scope

This spec covers the documentation contract for an agent-to-agent handoff. It does not prescribe a runtime, a message bus, or a specific orchestration framework. It is intentionally framework-agnostic so that the same handoff can be implemented as a LangGraph Command, an OpenAI Agents SDK handoff() tool, a Semantic Kernel HandoffOrchestration route, or an AutoGen delegate_tool, while sharing one source of truth in your repo.

In scope:

The static contract for one handoff edge between two named agents.
The minimum payload required for the receiving agent to act safely.
Acceptance, rejection, and recovery semantics.
Observability hooks (trace IDs, audit fields).

Out of scope:

The transport layer (HTTP, in-process, message queue).
Long-term memory consolidation across handoffs (covered in agent-trajectory-spec).
Tool invocation contracts (covered in agent-tool-use-spec).
Permission inheritance (covered in agent-permission-spec).

1.2 Normative language

Keywords MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY are interpreted per RFC 2119. Documentation that fails any MUST clause is non-conforming.

1.3 Definitions

Handoff. A discrete transfer of control from a source agent to a target agent, after which the source agent ceases to drive the task until reactivated by an explicit return handoff.
Handoff contract. The static, version-controlled document describing one handoff edge.
Handoff payload. The runtime data passed across the edge.
Handoff trigger. The condition under which the source agent emits the handoff.
Acceptance criteria. The receiver-side checks that determine whether the target agent will accept or reject the handoff.
Recovery path. The fallback behavior if the target rejects, times out, or errors.

2. Why a documentation spec (not just code)

A handoff that exists only inside agent prompts or framework configuration is invisible to reviewers, untestable, and brittle to refactor. Production teams report that handoff reliability — not model quality — is the dominant failure mode of multi-agent systems, with lost context and ambiguous output formats as the leading causes. Frameworks now treat handoffs as first-class primitives: OpenAI's Agents SDK exposes them as auto-named tools (transfer_to_); LangGraph's swarm library tracks the last-active agent so conversations resume on the right node; Semantic Kernel's handoff orchestration delegates by context; AutoGen exposes delegate_tools and topic subscriptions. Each framework's syntax differs, but the contract underneath is the same — and that contract is what this spec captures.

Documenting handoffs as contracts produces four concrete benefits:

Diff-able review. Changes to a handoff edge appear in pull requests as YAML diffs, not buried prompt edits.
Cross-framework portability. A team can migrate from AutoGen to Agent Framework or from supervisor to swarm without re-deriving handoff semantics.
Automated validation. A linter can verify that every documented handoff has a recovery path, that payloads reference declared schemas, and that no orphan target exists.
LLM-readable governance. Because the spec is machine-readable, an autonomous auditor agent can ingest it and verify that runtime traces conform.

3. Required fields (the contract)

Every documented handoff MUST declare the following six fields. Together they form the minimum viable contract.

3.1 id

A stable, kebab-case identifier unique within the system. MUST be referenced from runtime traces. Example: triage-to-refunds-v1.

3.2 source and target

Named agent identifiers. MUST match the canonical names used in agent registry. Wildcards are not permitted; one contract describes exactly one edge. A bidirectional handoff MUST be documented as two contracts.

3.3 trigger

A structured description of when the source agent emits the handoff. MUST include at least one of:

intent: a natural-language summary of the user or system condition (e.g., "user requests refund eligibility check").
predicate: a deterministic condition, expressed as a boolean over the agent state (e.g., state.confidence < 0.6 AND state.domain == "billing").
tool_call: the name of an internal tool whose invocation is itself the trigger (the SDK pattern where a handoff is exposed as a tool to the LLM).

A trigger SHOULD include both intent and predicate whenever the predicate is computable; relying solely on natural-language intent invites prompt drift.

3.4 payload

The data the source agent transmits. MUST be defined by reference to a named schema (JSON Schema, Pydantic, TypedDict, or equivalent), not as free text. The payload MUST include:

task_summary (string, ≤ 500 chars): a self-contained restatement of what the receiver should accomplish.
provenance (object): minimum context needed to reconstruct upstream decisions — at minimum a list of source-agent message IDs or a trajectory pointer.
constraints (object, optional but SHOULD be present): any policy, budget, or deadline carried over.

The payload MUST NOT transmit the entire raw conversation history when a summarized form is sufficient; uncontrolled growth is the dominant cause of context-window exhaustion in handoff chains. When full history is necessary, payload SHOULD include a history_strategy field with one of full, summary, last_k, or pointer.

3.5 acceptance_criteria

Receiver-side preconditions evaluated before the target agent starts work. MUST include:

required_fields: payload keys that must be non-null.
domain_match: a predicate confirming the task falls in the target's declared scope.
permission_check: a reference to a permission rule (see agent-permission-spec).

If any acceptance criterion fails, the target MUST reject the handoff via the recovery path, not silently coerce the input.

3.6 recovery

The behavior when the handoff cannot complete cleanly. MUST specify all of:

on_reject: where control returns when acceptance fails. Common values: source, supervisor, human, or a named fallback agent.
on_timeout: action after the receiver exceeds a documented timeout_ms budget.
on_error: action when the receiver raises an unhandled exception.
max_retries: integer, default 0. Retries MUST NOT be applied to non-idempotent handoffs (see §4.3).
loop_guard: a reference to a counter or set of visited agents that MUST be updated to prevent infinite handoff cycles.

4. Optional but recommended fields

4.1 observability

trace_id_field: the payload key carrying the distributed trace ID. SHOULD be present.
audit_event: name of the event emitted to the audit log on emit, accept, reject, complete.
metrics: list of counters/histograms incremented (e.g., handoff.latency_ms, handoff.rejected).

4.2 version and deprecation

Handoff contracts evolve. The contract SHOULD carry a semantic version. Breaking changes (renamed fields, removed fields, changed semantics) MUST bump the major version and SHOULD publish a deprecation block with a sunset date and a replacement_id.

4.3 idempotency

A handoff is idempotent when re-emitting it with the same payload produces the same observable outcome. The spec SHOULD declare:

idempotent: boolean.
dedupe_key: the payload field used to suppress duplicate accepts.
replay_window_ms: the time window within which duplicate emissions are dropped.

Non-idempotent handoffs MUST set max_retries: 0 and MUST NOT be retried by orchestrators on transient errors.

4.4 human_in_the_loop

If the handoff target may be a human (an interactive callback in Semantic Kernel, a "user" handoff in AutoGen, or a UI prompt), the contract MUST declare human_target: true and SHOULD specify the channel, timeout, and the synthetic agent identity used to resume the conversation if the human does not respond.

5. Canonical document structure

A conformant handoff contract MUST be a single YAML or JSON document. The recommended layout:

id: triage-to-refunds-v1
version: 1.0.0
source: triage-agent
target: refund-agent

trigger:

intent: "User asks about a refund or order cancellation."

predicate: "state.intent == 'refund' AND state.confidence >= 0.7"

tool_call: "transfer_to_refund_agent"

payload:

schema: "./schemas/refund-handoff.schema.json"

required:

task_summary
provenance
constraints

history_strategy: summary

acceptance_criteria:

required_fields: [task_summary, provenance.order_id]

domain_match: "target.domains contains 'billing'"

permission_check: "perm:refund:read"

recovery:

on_reject: supervisor

on_timeout: supervisor

on_error: source

timeout_ms: 30000

max_retries: 0

loop_guard: state.visited_agents

observability:

trace_id_field: payload.trace_id

audit_event: handoff.triage_to_refunds

metrics:

handoff.latency_ms
handoff.rejected

idempotency:

idempotent: true

dedupe_key: payload.provenance.order_id

replay_window_ms: 60000

human_in_the_loop: false

6. Conformance levels

A handoff document is graded against three conformance levels:

Level	Required clauses	Suitable for
L1 — Minimum	All §3 required fields	Internal prototypes
L2 — Production	L1 + observability, version, idempotency	Customer-facing systems
L3 — Audited	L2 + signed reviewer, frozen schemas, automated lint passing	Regulated domains

L3 documents MUST include reviewed_by and MUST reference frozen payload schemas (no $ref to mutable URLs).

7. Validation rules

A conforming linter MUST flag the following errors:

Orphan target: target does not appear in the agent registry.
Unreachable handoff: no source-agent prompt or tool exposes a path to this trigger.
Missing recovery: any of on_reject, on_timeout, on_error absent.
Loop risk: loop_guard absent and the target's contracts include any path back to source.
Retry on non-idempotent: idempotent: false combined with max_retries > 0.
Schema drift: payload schema referenced by payload.schema is missing or has changed since last review.
Permission mismatch: acceptance_criteria.permission_check is not declared in the permission registry.

Linters SHOULD additionally warn on payloads that pass entire conversation history without a history_strategy of full justified by a comment.

8. Worked example: triage → refunds

A customer-support workflow has three agents: a triage agent, a refund agent, and a logistics agent. The triage agent classifies user intent and hands off accordingly. Below is the conformant L2 contract for the triage → refund edge, expressed as Markdown surrounding the YAML.

id: triage-to-refunds-v1
version: 1.2.0
source: triage-agent
target: refund-agent
trigger:
  intent: "Refund, cancellation, or money-back inquiry."
  predicate: "state.intent in ['refund','cancel'] AND state.confidence >= 0.7"
  tool_call: transfer_to_refund_agent
payload:
  schema: ./schemas/refund-handoff@1.2.0.json
  required: [task_summary, provenance.order_id, constraints.policy_window_days]
  history_strategy: summary
acceptance_criteria:
  required_fields: [task_summary, provenance.order_id]
  domain_match: "target.domains contains 'billing'"
  permission_check: perm:refund:read
recovery:
  on_reject: supervisor
  on_timeout: supervisor
  on_error: source
  timeout_ms: 30000
  max_retries: 0
  loop_guard: state.visited_agents
observability:
  trace_id_field: payload.trace_id
  audit_event: handoff.triage_to_refunds
  metrics: [handoff.latency_ms, handoff.rejected]
idempotency:
  idempotent: true
  dedupe_key: payload.provenance.order_id
  replay_window_ms: 60000
human_in_the_loop: false

Runtime mapping:

OpenAI Agents SDK. The tool_call value (transfer_to_refund_agent) is the auto-generated tool name when refund_agent is added to the triage agent's handoffs list. The SDK enforces the tool-call cycle; this spec adds the contract that surrounds it.
LangGraph swarm. The trigger.predicate becomes the conditional edge or the inside of a Command(goto='refund-agent') returned from a node; loop_guard maps to the swarm's last-active-agent tracking plus an explicit visited set.
Semantic Kernel. The contract maps to a HandoffOrchestration route plus InteractiveCallback if human_in_the_loop were true.
AutoGen. tool_call becomes a delegate_tool; recovery.on_reject maps to the topic the source agent re-publishes to.

9. Migration guidance

Teams adopting this spec on an existing system SHOULD migrate in three phases:

Inventory. Enumerate every place where one agent transfers control to another. Each becomes one contract. Multi-target routing (one agent, many possible next agents) becomes one contract per target.
Backfill. Write L1 contracts for each. Block merges that introduce new handoffs without contracts.
Promote. Upgrade high-traffic and customer-facing handoffs to L2; promote regulated ones to L3.

10. Anti-patterns

Passing the full transcript every time. Causes runaway token cost and context overflow. Use history_strategy: summary plus a pointer.
Implicit handoffs via prompt instructions. Documentation that says "if the user asks about refunds, just answer in the refund tone" is not a handoff; it is role-mixing. Specialize agents and emit explicit handoffs.
Bidirectional handoffs without loop guard. Two agents that can each emit a handoff to the other will eventually loop. Always declare loop_guard.
Retrying non-idempotent operations. If accepting the handoff has external side effects (charge, email, ticket creation), retries duplicate them. Set max_retries: 0 and recover via supervisor or human.
Free-text payload. A payload of "a paragraph the receiver figures out" is not a contract. Always reference a named schema.

11. Relationship to other specs

agent-tool-use-spec — governs tool calls within an agent. A handoff exposed as a tool is documented here, but the tool's input/output schema is governed there.
agent-trajectory-spec — governs how the full execution path is captured for evaluation. The provenance field of a handoff payload is a pointer into a trajectory.
agent-permission-spec — governs what each agent is allowed to do. acceptance_criteria.permission_check references it.
agent-communication-protocol — governs the wire-level protocol (MCP, A2A, ACP). This spec is one layer above: it describes what the message means, not how it is transmitted.

For a broader entry point, see the AI agents hub.

12. FAQ

Q: Is a handoff the same as an agent-as-tool call?

No. In the agent-as-tool pattern, the orchestrator stays in control and calls a sub-agent for one turn, expecting a return value. In a handoff, the source agent relinquishes control; the target now drives the conversation until it emits its own handoff. Both patterns are valid, but they require different documentation: agent-as-tool is governed by the agent-tool-use-spec; handoffs are governed by this spec.

Q: Do I need a handoff contract for every framework primitive?

Yes. Every place where control transitions from one named agent to another MUST have a contract, regardless of whether the transition is implemented as a LangGraph Command, an Agents SDK tool, a Semantic Kernel route, or an AutoGen topic publish. The contract is the source of truth; the framework is the implementation.

Q: What if my system has dynamic agents (created at runtime)?

Declare a contract for each role rather than each instance. The contract's target is the role name (e.g., domain-expert), and the orchestrator binds it to a concrete agent at runtime. Acceptance criteria still apply per instance.

Q: How does this interact with Model Context Protocol (MCP) or Agent2Agent (A2A)?

Protocols like MCP and A2A standardize the wire format and authentication for agent-to-agent traffic; this spec standardizes the semantic contract that rides on top. You can implement a handoff over A2A by mapping the contract's payload.schema to an A2A message schema and the recovery block to A2A's error-handling conventions.

Q: Where do I store contracts in the repo?

A conventional layout is /agents//handoffs/.yaml, mirrored by /schemas/.schema.json. Linters and CI runners can then discover contracts by glob and validate them on every PR.

13. Changelog

1.0.0 — 2026-04-29. Initial publication.