Geodocs.dev

Agent Cost Disclosure Specification

ShareLinkedIn

Open this article in your favorite AI assistant for deeper analysis, summaries, or follow-up questions.

AI agent tool manifests must declare cost in machine-readable form: a base unit, a currency, tier breakpoints, and the runtime fields that will be echoed back so a cost-aware planner can reconcile expected and actual spend. This specification defines the minimum cost schema, the budget-exhaustion error semantics, and the disclosure patterns that prevent Denial-of-Wallet incidents in production agents.

TL;DR

Without cost disclosure, agents cannot make informed plans. They either burn budget on unimportant tool calls or refuse to use paid tools at all. This specification requires every tool manifest to publish a cost block describing per-call, per-token, or tiered pricing in a single normalized schema. Every tool response must echo the actual cost incurred. Budget-exhaustion errors must be a stable, well-typed failure mode so cost-aware planners can degrade gracefully instead of looping.

Definition

Agent cost disclosure is a structured declaration in a tool or skill manifest that tells an AI agent (or its orchestrator) what a single invocation will cost. "Cost" here means any of: dollar charge, token consumption that will appear on a bill, request quota burned, or any other metered resource the publisher considers a cost. The spec normalizes these into a single schema with explicit units so cost-aware planners can compare across heterogeneous tools.

It complements but does not replace upstream pricing pages. Pricing pages are for humans evaluating the product. Cost disclosure is for agents and orchestrators making per-call decisions in flight.

Why It Matters

Three pressures make cost disclosure non-optional in 2026.

  1. Real Denial-of-Wallet incidents. Practitioners report $150+ runaway bills from agent loops that re-invoked paid tools without budget controls (Reddit r/LangChain account of LangGraph runaway costs). At the organizational level, Zylo's 2026 data shows the average company now spends ~$384,500 annually on the OpenAI API alone, with AI-native application spend up 108% in 2025 (Zylo OpenAI cost data). When budgets are this large and call patterns this dynamic, ad-hoc cost guesses fail.
  1. Cost-aware planning is research-validated. Recent benchmarks show that explicit budget tracking changes agent behavior favorably. The BATS framework demonstrates that budget-aware agents "yield more favorable scaling curves and better cost-performance trade-offs" than budget-blind agents (arXiv 2511.17006 — Budget-Aware Tool-Use). Practitioner write-ups document 70% cost reductions from breaking monolithic agents into specialized agents that route based on declared cost (How I reduced AI agent cost by 70%, Musaib Altaf). None of this works without disclosed cost.
  1. Hidden surcharges break naive estimates. Anthropic's web search tool charges $10 per 1,000 searches on top of token costs, and the inference_geo parameter for US-only data residency adds a 10% premium on Opus 4.7 and above (platform.claude.com Pricing, Finout Anthropic API Pricing 2026). An agent that does not see these surcharges in the tool manifest will under-budget by double-digit percentages and either fail or accept lossy approximations.

How It Works

Cost disclosure has three coordinated channels.

Channel 1: manifest declaration. The tool manifest publishes a cost block that names a base unit (per-call, per-token, per-1000-requests, per-MB-egress), a currency (or tokens/requests for non-monetary metering), and tier breakpoints if the cost varies with volume.

Channel 2: runtime cost echo. Every successful tool response includes a usage block reporting actual consumption. OpenAI and Anthropic both demonstrate this pattern: OpenAI's Realtime API exposes input and output token counts per turn (developers.openai.com Realtime costs) and Anthropic's web search tool returns a server_tool_use.web_search_requests count alongside token counts (platform.claude.com Pricing). This specification generalizes the pattern to every tool.

Channel 3: budget-exhaustion error. When a budget is exceeded, the tool returns a stable error code (BUDGET_EXCEEDED) with structured detail: which budget, what threshold, and the optional replacement_uri of a cheaper alternative. Cost-aware planners catch this error and decide whether to escalate, degrade, or stop, rather than retry blindly.

flowchart TD
    A["Manifest cost block"] --> B["Planner estimates total"]
    B --> C["Issue tool call"]
    C --> D{"Within budget?"}
    D -->|Yes| E["Execute
+ echo usage in response"]
    D -->|No| F["Return BUDGET_EXCEEDED
+ replacement_uri"]
    E --> G["Planner reconciles
actual vs estimate"]
    G --> H["Update remaining budget"]
    F --> I["Planner switches tool
or degrades gracefully"]

Required Manifest Fields

Every agent tool or skill manifest MUST include a top-level cost block when the tool is metered. Free tools MUST set cost.metered: false and SHOULD include rate limit metadata in a separate block.

FieldTypeRequiredPurpose
cost.meteredbooleanYesWhether the tool is billable in any form
cost.modelenumYesper_call, per_token, per_unit, tiered, subscription
cost.currencystringYesISO 4217 code or tokens/requests for non-monetary
cost.unitstringYesWhat is being priced (e.g., 1M_input_tokens, 1_call, 1000_searches, 1_GB_egress)
cost.amountnumberYesCost per unit at the lowest tier
cost.tiersarrayIf cost.model = tieredTier boundary records
cost.surchargesarrayIf applicableNamed surcharges (e.g., data_residency_us, long_context)
cost.budget_exhaustionobjectYesHow the tool signals exhaustion (see below)
cost.runtime_echo_pathstringYesJSON path in the response where actual cost is echoed
cost.cached_discountnumberOptionalMultiplier for cached inputs (e.g., 0.25 = 75% off)

Cost Models

Four models cover the dominant pricing patterns observed across LLM APIs, search APIs, scraping APIs, geocoding APIs, and AWS-style request gateways. A fifth covers subscription-style flat fees.

ModelWhen to useExample
per_callFlat fee per invocation regardless of payloadAWS API Gateway REST APIs at $3.50 / 1M requests (aws.amazon.com/api-gateway/pricing)
per_tokenLLM-style prompt/completion meteringOpenAI gpt-image-2 text input at $5.00 / 1M tokens (openai.com/api/pricing)
per_unitOther metered units (searches, MB, rows)Anthropic web search at $10 / 1,000 searches
tieredVolume-based step pricingAWS API Gateway tier change at 333M requests
subscriptionFlat monthly fee with included quotaMany SaaS APIs

Mixing models is expressed by combining the base block with a cost.surcharges array. For example, an LLM tool with per_token base pricing plus a long-context surcharge above 200K tokens declares both the base block and a surcharge entry.

Runtime Cost Echo

Every tool response MUST include a usage block (or whatever path is declared in cost.runtime_echo_path) that reports actual cost incurred by this call. The minimum echo schema:

{
  "usage": {
    "model": "per_token",
    "input_tokens": 105,
    "output_tokens": 6039,
    "cache_read_input_tokens": 7123,
    "cache_creation_input_tokens": 7345,
    "server_tool_use": { "web_search_requests": 1 },
    "estimated_cost": { "amount": 0.0234, "currency": "USD" },
    "surcharges_applied": ["long_context"]
  }
}

Requirements:

  • The echo MUST report observed (not estimated) consumption when the publisher knows it.
  • If the publisher cannot compute the dollar amount in real time, estimated_cost MAY be omitted but units MUST be present.
  • Cached and non-cached portions MUST be reported separately when caching is offered.
  • Surcharges actually applied MUST be listed by name so the planner can audit billing later.

Budget-Exhaustion Error

A cost-aware planner needs a dedicated, stable failure mode. When a publisher-side or client-side budget is exceeded, the tool returns:

{
  "error": {
    "code": "BUDGET_EXCEEDED",
    "budget": "organization_monthly_usd",
    "limit": 50000,
    "used": 50127.43,
    "resets_at": "2026-06-01T00:00:00Z",
    "replacement_uri": "https://example.com/tools/cheaper-search"
  }
}

The error code BUDGET_EXCEEDED is distinct from RATE_LIMIT_EXCEEDED. They have different semantics: rate limits reset on a short rolling window, budgets reset on calendar boundaries or require human action. Cost-aware planners typically retry on rate limits and degrade or stop on budget exhaustion (When Budget Runs Out, runcycles.io).

Cost vs Latency Tradeoff Disclosure

Faster tools often cost more. Publishers SHOULD disclose the tradeoff explicitly using a tradeoffs block referencing peer tools or peer modes within the same tool. Example: a fast_mode: true parameter that reduces latency but disables caching, doubling the per-token cost. Disclose the multiplier so the planner can choose deliberately.

The arXiv cost-and-latency benchmark on plan-execute-replan agents demonstrates that explicit tradeoff disclosure changes which actions get selected, with measurable impact on end-to-end task latency at constant cost (arXiv 2601.02663 — Cost- and Latency-Aware Benchmark).

Worked Examples

Example 1 — LLM tool with caching and surcharges.

cost:
  metered: true
  model: per_token
  currency: USD
  unit: 1M_input_tokens
  amount: 5.00
  cached_discount: 0.25
  surcharges:
    - name: long_context
      condition: "context > 200000"
      multiplier_input: 2.0
      multiplier_output: 1.5
    - name: data_residency_us
      multiplier_total: 1.10
  runtime_echo_path: $.usage
  budget_exhaustion:
    error_code: BUDGET_EXCEEDED
    response_status: 429

Example 2 — Web search tool with per-1000 unit pricing.

cost:
  metered: true
  model: per_unit
  currency: USD
  unit: 1000_searches
  amount: 10.00
  runtime_echo_path: $.usage.server_tool_use.web_search_requests
  budget_exhaustion:
    error_code: BUDGET_EXCEEDED

Example 3 — Geocoding API with per-call pricing.

cost:
  metered: true
  model: per_call
  currency: USD
  unit: 1_call
  amount: 0.005
  runtime_echo_path: $.metadata.billed_units
  budget_exhaustion:
    error_code: BUDGET_EXCEEDED

Example 4 — Tiered request gateway.

cost:
  metered: true
  model: tiered
  currency: USD
  unit: 1M_requests
  tiers:
    - up_to: 333000000
      amount: 3.50
    - up_to: null
      amount: 2.80
  runtime_echo_path: $.usage.requests

Example 5 — Free tool.

cost:
  metered: false
  model: per_call
  currency: requests
  unit: 1_call
  amount: 0
  runtime_echo_path: $.usage

Common Mistakes

  • Quoting prices only on a marketing page. Humans read pricing pages; agents read manifests. Both must agree, but the manifest is the contract.
  • Hiding surcharges in fine print. Long-context, data-residency, and storage surcharges materially change planner decisions. Disclose them in cost.surcharges with conditions.
  • Using RATE_LIMIT_EXCEEDED for budget exhaustion. They are different. A planner that retries-on-rate-limit will burn more money on a budget breach.
  • Reporting estimated cost only. Echo observed usage. The planner reconciles estimate vs actual to detect drift and flag billing surprises.
  • Omitting cached vs uncached breakdown. Caching can reduce cost by 75% on repeat content; planners need both numbers to choose between caching and non-caching modes.
  • Treating cost disclosure as optional for free tools. Even free tools have rate budgets. Set cost.metered: false and link to the rate limit block.

FAQ

Q: Does this spec require dollar-denominated pricing?

No. Tools that meter in tokens, requests, or other units can declare those directly with currency: tokens or currency: requests. Cost-aware planners convert across units when an exchange rate is known. When it is not, planners track each currency independently.

Q: How does this interact with subscription pricing?

Declare the subscription as cost.model: subscription with the included quota as tiers. Once the included quota is exhausted, the next tier kicks in. Subscription-only tools that have no overage quota set cost.amount: 0 and rely on rate limits to enforce ceiling.

Q: What about tools that reveal cost only after execution?

Publish the worst-case rate in the manifest and echo the actual cost in the response. Planners use the worst case for budgeting and reconcile downward when actual usage is lower. This pattern matches how LLM token billing already works in practice.

Q: Should the manifest cost be authoritative or should the live API be?

The live API is authoritative for billing; the manifest is authoritative for planning. The two MUST stay consistent within publisher-defined update windows, and changes to cost SHOULD trigger a MINOR version bump per the agent versioning spec. Drift between manifest and live billing is grounds for a BUDGET_EXCEEDED-equivalent client-side abort.

Q: How granular should tier breakpoints be?

Match the granularity used in your published pricing page. AWS API Gateway, OpenAI, and Anthropic all expose simple two- or three-tier structures. Avoid more than five tiers; planners struggle to compute optimal call placement against high tier counts.

Q: What if my tool uses external providers I do not control the price of?

Declare a cost.amount representing your markup or pass-through, and add a cost.surcharges entry pointing to the upstream provider with pass_through: true. Echo upstream usage in the runtime block so the planner can see the real cost composition.

Q: Does this spec cover compute resource costs (CPU, GPU, RAM)?

If the publisher meters them, yes. Use currency: USD (or the appropriate unit) and pick a unit that reflects what is billed. If compute cost is bundled into a per-call price, no separate disclosure is required.

Q: How should a planner cap spend across many tools?

Maintain a ledger keyed by currency (USD, tokens, requests). Subtract observed cost from the runtime echo on every successful call. Halt on BUDGET_EXCEEDED. The Budget Tracker module described in BATS provides a reference implementation pattern.

Related Articles

checklist

Agent Rate Limit Documentation Checklist: Disclosing Quotas, Retries, and Burst Limits to AI Agents

Agent rate limit documentation checklist: disclose quotas, retry-after headers, hierarchical limits, and burst guardrails so AI agents back off cleanly.

specification

Agent Skill Manifest Specification: Publishing SKILL.md for AI Agent Discovery

Agent Skill Manifest specification: how to author and publish SKILL.md so Claude, ChatGPT, Codex, Gemini, and Copilot agents discover and reuse your docs.

specification

Agent Tool Use Documentation Specification

Specification for documenting tools so AI agents can discover, understand, and correctly invoke them: structured schemas, examples, error semantics, and idempotency hints.

Topics
Stay Updated

GEO & AI Search Insights

New articles, framework updates, and industry analysis. No spam, unsubscribe anytime.