Function Calling Documentation Spec: How to Document Tools for AI Agents
A publisher-side specification for documenting function calling (tool calling) endpoints so AI agents from OpenAI, Anthropic Claude, Google Gemini, and MCP-compliant clients can discover, choose, and invoke them with minimal hallucination. Defines required fields, JSON Schema patterns, error taxonomy, idempotency hints, and example structures.
TL;DR
Vendor docs explain how to call function-calling APIs. They rarely explain how to document the tools you publish. This spec fills that gap. Use seven required blocks per tool — name, description, parameters (JSON Schema), returns, errors, idempotency, and examples — plus optional MCP annotations for open-world or destructive behavior. Tools that conform are dramatically easier for agents to select, invoke, and cite.
1. Scope and audience
This specification applies to anyone publishing callable tools that LLM-based agents will invoke through function calling, tool use, or the Model Context Protocol (MCP). It is platform-agnostic; the resulting tool descriptors can be transformed into OpenAI tool schema, Anthropic tool blocks, Gemini function declarations, or MCP tools/list payloads.
It is not a guide on how to consume function calling responses on the client side — see vendor API guides for that.
2. Why a publisher-side spec
Function calling is the bridge between LLMs and the real world: the model emits a structured JSON object naming a tool and its arguments, the runtime executes it, and the result feeds back into the conversation. Three problems make ad-hoc tool documentation unreliable in 2026:
- Tool sprawl. Production agents now load dozens to thousands of tools. OpenAI's tool_search, Anthropic's Tool Search Tool, and MCP's tool listing all rely on description quality to rank candidates.
- Cross-platform reuse. The same tool is exposed to ChatGPT, Claude, Gemini, and custom agents. A single descriptor must travel.
- Audit and citation. GEO/AEO teams want AI surfaces to cite tool documentation directly. That requires structured, extractable metadata.
3. Required tool descriptor fields
Every documented tool MUST include the seven fields below.
3.1 name
- Stable, machine-readable identifier in snake_case.
- Unique within the tool catalog.
- ≤ 64 characters; ASCII letters, digits, and underscores only.
- MUST NOT change once published; deprecate and rename instead.
3.2 description
- One paragraph, 2-5 sentences.
- Answer-first: the first sentence states what the tool does and when to call it.
- Mention concrete trigger phrases ("current weather", "latest invoice") when relevant.
- Avoid implementation details; focus on observable behavior.
- Anthropic's tool-writing guidance and OpenAI's tool calling guide both rank description quality as the single largest predictor of correct tool selection.
3.3 parameters (JSON Schema)
- A valid JSON Schema object (type: "object").
- Each property MUST have a description.
- Use enum for closed value sets.
- Use format (date, date-time, email, uri) where applicable.
- Mark required fields explicitly via required: [...].
- Set additionalProperties: false unless free-form input is intentional.
- Prefer flat schemas; nested objects degrade smaller-model accuracy.
3.4 returns
- A JSON Schema describing the successful return shape.
- Include a one-line description summarizing the payload.
- Document units, time zones, and currency codes inline.
3.5 errors
A list of error codes the tool can emit, each with:
- code — stable string (NOT_FOUND, RATE_LIMITED, VALIDATION_ERROR).
- http_status — closest HTTP analogue.
- retryable — true | false.
- description — human-readable explanation.
- recovery — what the agent should do ("retry after Retry-After seconds", "ask user for clarification", "abort").
3.6 idempotency
- idempotent: true | false.
- safe: true | false (no side effects at all).
- destructive: true | false (irreversibly changes state).
- These map directly to MCP tool annotations (readOnlyHint, destructiveHint, idempotent, openWorld).
3.7 examples
At least two worked examples per tool:
- A canonical successful call with arguments and return.
- An error case showing the error envelope.
Examples should be short, complete, and copy-pasteable.
4. Optional but recommended fields
| Field | Purpose |
|---|---|
| version | SemVer for the tool descriptor. |
| deprecated | Boolean + replacement tool name. |
| rate_limits | Requests-per-minute, burst, scope. |
| auth | none, api_key, oauth, mcp_session. |
| latency_p50_ms | Helps agents budget multi-step plans. |
| cost_hint | free, cheap, metered, expensive. |
| open_world | Tool reaches public internet (MCP openWorldHint). |
| tool_search_keywords | Extra retrieval terms for large catalogs. |
5. Description writing rules
Good descriptions are the single biggest lever on tool-call accuracy.
- Lead with verb + object. "Fetch the current weather…" not "This tool can be used to…".
- Name the trigger. Quote the user phrases that should activate it.
- State scope limits. "Returns weather for the next 24 hours only."
- Disambiguate from siblings. If you also expose get_forecast, explain when each fires.
- Avoid marketing voice. Agents do not respond to adjectives.
6. Parameter schema rules
- Every property has a description. No exceptions.
- Constrain values. Use enum, minimum, maximum, pattern, and format.
- Document units in the description. "Temperature in degrees Celsius."
- Default to required-by-default. If a parameter is optional, say so explicitly and provide a sensible server-side default.
- Keep depth ≤ 2 levels. Deeper nesting hurts smaller-model accuracy.
- Avoid oneOf / anyOf at the top level. Split into separate tools instead.
7. Error taxonomy
A stable, well-documented error taxonomy lets agents recover automatically. The recommended baseline:
| Code | http_status | retryable | Recovery |
|---|---|---|---|
| VALIDATION_ERROR | 400 | false | Re-prompt with corrected arguments. |
| UNAUTHORIZED | 401 | false | Surface auth error to user. |
| FORBIDDEN | 403 | false | Stop; do not retry. |
| NOT_FOUND | 404 | false | Confirm identifier with user. |
| CONFLICT | 409 | false | Re-fetch state and retry. |
| RATE_LIMITED | 429 | true | Backoff using Retry-After. |
| INTERNAL | 500 | true | Exponential backoff, max 3 attempts. |
| UNAVAILABLE | 503 | true | Backoff with jitter. |
| TIMEOUT | 504 | true | Single retry, then abort. |
Return errors as a structured envelope rather than free text:
{
"error": {
"code": "RATE_LIMITED",
"message": "Quota exceeded",
"retry_after_seconds": 30
}
}8. Idempotency and safety hints
MCP formalizes four annotations agents use to plan safely: readOnlyHint, destructiveHint, idempotent, openWorld. Mirror them in your descriptor:
- Read-only tools can be called speculatively or in parallel.
- Idempotent writes can be retried safely on transient failure.
- Destructive tools SHOULD require user confirmation in the agent UI.
- Open-world tools (web fetch, search) need crawl-etiquette guarantees — see the Browser Agent Crawl Etiquette spec on Geodocs.
9. Example structures
Examples are part of the descriptor, not an afterthought. Each example MUST contain:
- A short natural-language prompt that triggers the tool.
- The tool_call JSON the agent should emit.
- The result JSON the runtime returns.
- A 1-sentence notes field if the example demonstrates a subtle case.
10. Cross-platform mapping
The descriptor is the source of truth. Generate platform payloads from it.
- OpenAI Responses / Chat Completions. Map name, description, parameters directly into the tool object. strict: true is recommended; it forces the model to honor required and additionalProperties: false.
- Anthropic Messages. Map into tools[].name, tools[].description, tools[].input_schema.
- Gemini. Map into function_declarations[].name, description, parameters.
- MCP servers. Expose via tools/list with inputSchema, optional outputSchema, and the annotations block populated from §8.
11. Versioning and deprecation
- Use SemVer in the version field.
- Breaking changes (renamed parameter, removed field, changed enum) MUST bump the major version and ship under a new name (get_weather_v2).
- Mark the old tool deprecated: true and set replacement to the new name.
- Keep deprecated tools live for at least one release cycle to give agents time to retrain or re-index.
12. Tool-search readiness
Agents at scale rely on retrieval over tool catalogs (OpenAI tool_search, Anthropic Tool Search Tool, MCP discovery). To rank well:
- Include 3-7 tool_search_keywords covering synonyms users might say.
- Keep description ≤ 600 characters; longer descriptions cost retrieval recall.
- Prefer one purpose per tool. Multi-purpose tools rank lower for any single intent.
13. Conformance levels
- Level 1 — Minimum: All seven required fields per §3.
- Level 2 — Production: Adds error taxonomy (§7), idempotency hints (§8), and ≥ 2 examples per tool.
- Level 3 — AI-native: Adds tool-search keywords, latency hints, deprecation policy, and per-platform schema generators.
14. Validation checklist
- [ ] Every tool has all seven required fields.
- [ ] Every parameter has a description.
- [ ] At least one happy-path and one error example.
- [ ] Errors return a structured envelope with code.
- [ ] Idempotency and destructiveness are explicit.
- [ ] No breaking changes without a version bump and rename.
- [ ] Descriptor renders correctly to OpenAI, Anthropic, Gemini, and MCP payloads.
15. FAQ
Q: Is this the same as JSON Schema for parameters?
No. JSON Schema covers parameters only. This spec covers the full publisher-side descriptor — name, description, returns, errors, idempotency, and examples — with JSON Schema as one component (§3.3).
Q: How does this relate to MCP?
MCP standardizes the transport (JSON-RPC 2.0) and the listing endpoint (tools/list). This spec standardizes the content of each tool descriptor so the same content travels cleanly to OpenAI, Anthropic, Gemini, or any MCP server.
Q: Do I need separate tools for read and write operations?
Yes when feasible. Splitting get_user and update_user lets agents reason about safety per call and lets MCP annotations (readOnlyHint, destructiveHint) be set accurately.
Q: How long should a tool description be?
2-5 sentences, ≤ 600 characters. Longer descriptions hurt retrieval in large catalogs and dilute the leading verb-object signal.
Q: Should I include latency or cost hints?
Yes, in production. Agents that plan multi-step workflows budget tool calls. Even rough hints (fast, slow; free, metered) measurably improve plan quality.
Related Articles
Browser Agent Crawl Etiquette: A Specification for Polite Autonomous AI Browsing
A specification defining how browser-based AI agents should identify themselves, throttle requests, and respect publisher signals to maintain citation trust.
AI Search SERP Feature Citation Map: Where AI Mentions Appear in 2026
AI search SERP feature citation map: a 2026 checklist of every surface where AI mentions appear, from AI Overviews to Perplexity Sources.
LLM Citation Anchor Text Patterns: How Generative Engines Phrase Source Mentions
LLM citation anchor text patterns reference cataloging how ChatGPT, Perplexity, Gemini, and Claude phrase source mentions across answer formats and engines.