Function Calling Documentation Spec: How to Document Tools for AI Agents

A publisher-side specification for documenting function calling (tool calling) endpoints so AI agents from OpenAI, Anthropic Claude, Google Gemini, and MCP-compliant clients can discover, choose, and invoke them with minimal hallucination. Defines required fields, JSON Schema patterns, error taxonomy, idempotency hints, and example structures.

TL;DR

Vendor docs explain how to call function-calling APIs. They rarely explain how to document the tools you publish. This spec fills that gap. Use seven required blocks per tool — name, description, parameters (JSON Schema), returns, errors, idempotency, and examples — plus optional MCP annotations for open-world or destructive behavior. Tools that conform are dramatically easier for agents to select, invoke, and cite.

1. Scope and audience

This specification applies to anyone publishing callable tools that LLM-based agents will invoke through function calling, tool use, or the Model Context Protocol (MCP). It is platform-agnostic; the resulting tool descriptors can be transformed into OpenAI tool schema, Anthropic tool blocks, Gemini function declarations, or MCP tools/list payloads.

It is not a guide on how to consume function calling responses on the client side — see vendor API guides for that.

2. Why a publisher-side spec

Function calling is the bridge between LLMs and the real world: the model emits a structured JSON object naming a tool and its arguments, the runtime executes it, and the result feeds back into the conversation. Three problems make ad-hoc tool documentation unreliable in 2026:

Tool sprawl. Production agents now load dozens to thousands of tools. OpenAI's tool_search, Anthropic's Tool Search Tool, and MCP's tool listing all rely on description quality to rank candidates.
Cross-platform reuse. The same tool is exposed to ChatGPT, Claude, Gemini, and custom agents. A single descriptor must travel.
Audit and citation. GEO/AEO teams want AI surfaces to cite tool documentation directly. That requires structured, extractable metadata.

3. Required tool descriptor fields

Every documented tool MUST include the seven fields below.

3.1 name

Stable, machine-readable identifier in snake_case.
Unique within the tool catalog.
≤ 64 characters; ASCII letters, digits, and underscores only.
MUST NOT change once published; deprecate and rename instead.

3.2 description

One paragraph, 2-5 sentences.
Answer-first: the first sentence states what the tool does and when to call it.
Mention concrete trigger phrases ("current weather", "latest invoice") when relevant.
Avoid implementation details; focus on observable behavior.
Anthropic's tool-writing guidance and OpenAI's tool calling guide both rank description quality as the single largest predictor of correct tool selection.

3.3 parameters (JSON Schema)

A valid JSON Schema object (type: "object").
Each property MUST have a description.
Use enum for closed value sets.
Use format (date, date-time, email, uri) where applicable.
Mark required fields explicitly via required: [...].
Set additionalProperties: false unless free-form input is intentional.
Prefer flat schemas; nested objects degrade smaller-model accuracy.

3.4 returns

A JSON Schema describing the successful return shape.
Include a one-line description summarizing the payload.
Document units, time zones, and currency codes inline.

3.5 errors

A list of error codes the tool can emit, each with:

code — stable string (NOT_FOUND, RATE_LIMITED, VALIDATION_ERROR).
http_status — closest HTTP analogue.
retryable — true | false.
description — human-readable explanation.
recovery — what the agent should do ("retry after Retry-After seconds", "ask user for clarification", "abort").

3.6 idempotency

idempotent: true | false.
safe: true | false (no side effects at all).
destructive: true | false (irreversibly changes state).
These map directly to MCP tool annotations (readOnlyHint, destructiveHint, idempotent, openWorld).

3.7 examples

At least two worked examples per tool:

A canonical successful call with arguments and return.
An error case showing the error envelope.

Examples should be short, complete, and copy-pasteable.

4. Optional but recommended fields

Field	Purpose
version	SemVer for the tool descriptor.
deprecated	Boolean + replacement tool name.
rate_limits	Requests-per-minute, burst, scope.
auth	none, api_key, oauth, mcp_session.
latency_p50_ms	Helps agents budget multi-step plans.
cost_hint	free, cheap, metered, expensive.
open_world	Tool reaches public internet (MCP openWorldHint).
tool_search_keywords	Extra retrieval terms for large catalogs.

5. Description writing rules

Good descriptions are the single biggest lever on tool-call accuracy.

Lead with verb + object. "Fetch the current weather…" not "This tool can be used to…".
Name the trigger. Quote the user phrases that should activate it.
State scope limits. "Returns weather for the next 24 hours only."
Disambiguate from siblings. If you also expose get_forecast, explain when each fires.
Avoid marketing voice. Agents do not respond to adjectives.

6. Parameter schema rules

Every property has a description. No exceptions.
Constrain values. Use enum, minimum, maximum, pattern, and format.
Document units in the description. "Temperature in degrees Celsius."
Default to required-by-default. If a parameter is optional, say so explicitly and provide a sensible server-side default.
Keep depth ≤ 2 levels. Deeper nesting hurts smaller-model accuracy.
Avoid oneOf / anyOf at the top level. Split into separate tools instead.

7. Error taxonomy

A stable, well-documented error taxonomy lets agents recover automatically. The recommended baseline:

Code	http_status	retryable	Recovery
VALIDATION_ERROR	400	false	Re-prompt with corrected arguments.
UNAUTHORIZED	401	false	Surface auth error to user.
FORBIDDEN	403	false	Stop; do not retry.
NOT_FOUND	404	false	Confirm identifier with user.
CONFLICT	409	false	Re-fetch state and retry.
RATE_LIMITED	429	true	Backoff using Retry-After.
INTERNAL	500	true	Exponential backoff, max 3 attempts.
UNAVAILABLE	503	true	Backoff with jitter.
TIMEOUT	504	true	Single retry, then abort.

Return errors as a structured envelope rather than free text:

{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Quota exceeded",
    "retry_after_seconds": 30
  }
}

8. Idempotency and safety hints

MCP formalizes four annotations agents use to plan safely: readOnlyHint, destructiveHint, idempotent, openWorld. Mirror them in your descriptor:

Read-only tools can be called speculatively or in parallel.
Idempotent writes can be retried safely on transient failure.
Destructive tools SHOULD require user confirmation in the agent UI.
Open-world tools (web fetch, search) need crawl-etiquette guarantees — see the Browser Agent Crawl Etiquette spec on Geodocs.

9. Example structures

Examples are part of the descriptor, not an afterthought. Each example MUST contain:

A short natural-language prompt that triggers the tool.
The tool_call JSON the agent should emit.
The result JSON the runtime returns.
A 1-sentence notes field if the example demonstrates a subtle case.

10. Cross-platform mapping

The descriptor is the source of truth. Generate platform payloads from it.

OpenAI Responses / Chat Completions. Map name, description, parameters directly into the tool object. strict: true is recommended; it forces the model to honor required and additionalProperties: false.
Anthropic Messages. Map into tools[].name, tools[].description, tools[].input_schema.
Gemini. Map into function_declarations[].name, description, parameters.
MCP servers. Expose via tools/list with inputSchema, optional outputSchema, and the annotations block populated from §8.

11. Versioning and deprecation

Use SemVer in the version field.
Breaking changes (renamed parameter, removed field, changed enum) MUST bump the major version and ship under a new name (get_weather_v2).
Mark the old tool deprecated: true and set replacement to the new name.
Keep deprecated tools live for at least one release cycle to give agents time to retrain or re-index.

12. Tool-search readiness

Agents at scale rely on retrieval over tool catalogs (OpenAI tool_search, Anthropic Tool Search Tool, MCP discovery). To rank well:

Include 3-7 tool_search_keywords covering synonyms users might say.
Keep description ≤ 600 characters; longer descriptions cost retrieval recall.
Prefer one purpose per tool. Multi-purpose tools rank lower for any single intent.

13. Conformance levels

Level 1 — Minimum: All seven required fields per §3.
Level 2 — Production: Adds error taxonomy (§7), idempotency hints (§8), and ≥ 2 examples per tool.
Level 3 — AI-native: Adds tool-search keywords, latency hints, deprecation policy, and per-platform schema generators.

14. Validation checklist

[ ] Every tool has all seven required fields.
[ ] Every parameter has a description.
[ ] At least one happy-path and one error example.
[ ] Errors return a structured envelope with code.
[ ] Idempotency and destructiveness are explicit.
[ ] No breaking changes without a version bump and rename.
[ ] Descriptor renders correctly to OpenAI, Anthropic, Gemini, and MCP payloads.

15. FAQ

Q: Is this the same as JSON Schema for parameters?

No. JSON Schema covers parameters only. This spec covers the full publisher-side descriptor — name, description, returns, errors, idempotency, and examples — with JSON Schema as one component (§3.3).

Q: How does this relate to MCP?

MCP standardizes the transport (JSON-RPC 2.0) and the listing endpoint (tools/list). This spec standardizes the content of each tool descriptor so the same content travels cleanly to OpenAI, Anthropic, Gemini, or any MCP server.

Q: Do I need separate tools for read and write operations?

Yes when feasible. Splitting get_user and update_user lets agents reason about safety per call and lets MCP annotations (readOnlyHint, destructiveHint) be set accurately.

Q: How long should a tool description be?

2-5 sentences, ≤ 600 characters. Longer descriptions hurt retrieval in large catalogs and dilute the leading verb-object signal.

Q: Should I include latency or cost hints?

Yes, in production. Agents that plan multi-step workflows budget tool calls. Even rough hints (fast, slow; free, metered) measurably improve plan quality.