Agent Cost Disclosure Specification

AI agent tool manifests must declare cost in machine-readable form: a base unit, a currency, tier breakpoints, and the runtime fields that will be echoed back so a cost-aware planner can reconcile expected and actual spend. This specification defines the minimum cost schema, the budget-exhaustion error semantics, and the disclosure patterns that prevent Denial-of-Wallet incidents in production agents.

TL;DR

Without cost disclosure, agents cannot make informed plans. They either burn budget on unimportant tool calls or refuse to use paid tools at all. This specification requires every tool manifest to publish a cost block describing per-call, per-token, or tiered pricing in a single normalized schema. Every tool response must echo the actual cost incurred. Budget-exhaustion errors must be a stable, well-typed failure mode so cost-aware planners can degrade gracefully instead of looping.

Definition

Agent cost disclosure is a structured declaration in a tool or skill manifest that tells an AI agent (or its orchestrator) what a single invocation will cost. "Cost" here means any of: dollar charge, token consumption that will appear on a bill, request quota burned, or any other metered resource the publisher considers a cost. The spec normalizes these into a single schema with explicit units so cost-aware planners can compare across heterogeneous tools.

It complements but does not replace upstream pricing pages. Pricing pages are for humans evaluating the product. Cost disclosure is for agents and orchestrators making per-call decisions in flight.

Why It Matters

Three pressures make cost disclosure non-optional in 2026.

Real Denial-of-Wallet incidents. Practitioners report $150+ runaway bills from agent loops that re-invoked paid tools without budget controls (Reddit r/LangChain account of LangGraph runaway costs). At the organizational level, Zylo's 2026 data shows the average company now spends ~$384,500 annually on the OpenAI API alone, with AI-native application spend up 108% in 2025 (Zylo OpenAI cost data). When budgets are this large and call patterns this dynamic, ad-hoc cost guesses fail.

Cost-aware planning is research-validated. Recent benchmarks show that explicit budget tracking changes agent behavior favorably. The BATS framework demonstrates that budget-aware agents "yield more favorable scaling curves and better cost-performance trade-offs" than budget-blind agents (arXiv 2511.17006 — Budget-Aware Tool-Use). Practitioner write-ups document 70% cost reductions from breaking monolithic agents into specialized agents that route based on declared cost (How I reduced AI agent cost by 70%, Musaib Altaf). None of this works without disclosed cost.

Hidden surcharges break naive estimates. Anthropic's web search tool charges $10 per 1,000 searches on top of token costs, and the inference_geo parameter for US-only data residency adds a 10% premium on Opus 4.7 and above (platform.claude.com Pricing, Finout Anthropic API Pricing 2026). An agent that does not see these surcharges in the tool manifest will under-budget by double-digit percentages and either fail or accept lossy approximations.

How It Works

Cost disclosure has three coordinated channels.

Channel 1: manifest declaration. The tool manifest publishes a cost block that names a base unit (per-call, per-token, per-1000-requests, per-MB-egress), a currency (or tokens/requests for non-monetary metering), and tier breakpoints if the cost varies with volume.

Channel 2: runtime cost echo. Every successful tool response includes a usage block reporting actual consumption. OpenAI and Anthropic both demonstrate this pattern: OpenAI's Realtime API exposes input and output token counts per turn (developers.openai.com Realtime costs) and Anthropic's web search tool returns a server_tool_use.web_search_requests count alongside token counts (platform.claude.com Pricing). This specification generalizes the pattern to every tool.

Channel 3: budget-exhaustion error. When a budget is exceeded, the tool returns a stable error code (BUDGET_EXCEEDED) with structured detail: which budget, what threshold, and the optional replacement_uri of a cheaper alternative. Cost-aware planners catch this error and decide whether to escalate, degrade, or stop, rather than retry blindly.

flowchart TD
    A["Manifest cost block"] --> B["Planner estimates total"]
    B --> C["Issue tool call"]
    C --> D{"Within budget?"}
    D -->|Yes| E["Execute
+ echo usage in response"]
    D -->|No| F["Return BUDGET_EXCEEDED
+ replacement_uri"]
    E --> G["Planner reconciles
actual vs estimate"]
    G --> H["Update remaining budget"]
    F --> I["Planner switches tool
or degrades gracefully"]

Required Manifest Fields

Every agent tool or skill manifest MUST include a top-level cost block when the tool is metered. Free tools MUST set cost.metered: false and SHOULD include rate limit metadata in a separate block.

Field	Type	Required	Purpose
cost.metered	boolean	Yes	Whether the tool is billable in any form
cost.model	enum	Yes	per_call, per_token, per_unit, tiered, subscription
cost.currency	string	Yes	ISO 4217 code or tokens/requests for non-monetary
cost.unit	string	Yes	What is being priced (e.g., 1M_input_tokens, 1_call, 1000_searches, 1_GB_egress)
cost.amount	number	Yes	Cost per unit at the lowest tier
cost.tiers	array	If cost.model = tiered	Tier boundary records
cost.surcharges	array	If applicable	Named surcharges (e.g., data_residency_us, long_context)
cost.budget_exhaustion	object	Yes	How the tool signals exhaustion (see below)
cost.runtime_echo_path	string	Yes	JSON path in the response where actual cost is echoed
cost.cached_discount	number	Optional	Multiplier for cached inputs (e.g., 0.25 = 75% off)

Cost Models

Four models cover the dominant pricing patterns observed across LLM APIs, search APIs, scraping APIs, geocoding APIs, and AWS-style request gateways. A fifth covers subscription-style flat fees.

Model	When to use	Example
per_call	Flat fee per invocation regardless of payload	AWS API Gateway REST APIs at $3.50 / 1M requests (aws.amazon.com/api-gateway/pricing)
per_token	LLM-style prompt/completion metering	OpenAI gpt-image-2 text input at $5.00 / 1M tokens (openai.com/api/pricing)
per_unit	Other metered units (searches, MB, rows)	Anthropic web search at $10 / 1,000 searches
tiered	Volume-based step pricing	AWS API Gateway tier change at 333M requests
subscription	Flat monthly fee with included quota	Many SaaS APIs

Mixing models is expressed by combining the base block with a cost.surcharges array. For example, an LLM tool with per_token base pricing plus a long-context surcharge above 200K tokens declares both the base block and a surcharge entry.

Runtime Cost Echo

Every tool response MUST include a usage block (or whatever path is declared in cost.runtime_echo_path) that reports actual cost incurred by this call. The minimum echo schema:

{
  "usage": {
    "model": "per_token",
    "input_tokens": 105,
    "output_tokens": 6039,
    "cache_read_input_tokens": 7123,
    "cache_creation_input_tokens": 7345,
    "server_tool_use": { "web_search_requests": 1 },
    "estimated_cost": { "amount": 0.0234, "currency": "USD" },
    "surcharges_applied": ["long_context"]
  }
}

Requirements:

The echo MUST report observed (not estimated) consumption when the publisher knows it.
If the publisher cannot compute the dollar amount in real time, estimated_cost MAY be omitted but units MUST be present.
Cached and non-cached portions MUST be reported separately when caching is offered.
Surcharges actually applied MUST be listed by name so the planner can audit billing later.

Budget-Exhaustion Error

A cost-aware planner needs a dedicated, stable failure mode. When a publisher-side or client-side budget is exceeded, the tool returns:

{
  "error": {
    "code": "BUDGET_EXCEEDED",
    "budget": "organization_monthly_usd",
    "limit": 50000,
    "used": 50127.43,
    "resets_at": "2026-06-01T00:00:00Z",
    "replacement_uri": "https://example.com/tools/cheaper-search"
  }
}

The error code BUDGET_EXCEEDED is distinct from RATE_LIMIT_EXCEEDED. They have different semantics: rate limits reset on a short rolling window, budgets reset on calendar boundaries or require human action. Cost-aware planners typically retry on rate limits and degrade or stop on budget exhaustion (When Budget Runs Out, runcycles.io).

Cost vs Latency Tradeoff Disclosure

Faster tools often cost more. Publishers SHOULD disclose the tradeoff explicitly using a tradeoffs block referencing peer tools or peer modes within the same tool. Example: a fast_mode: true parameter that reduces latency but disables caching, doubling the per-token cost. Disclose the multiplier so the planner can choose deliberately.

The arXiv cost-and-latency benchmark on plan-execute-replan agents demonstrates that explicit tradeoff disclosure changes which actions get selected, with measurable impact on end-to-end task latency at constant cost (arXiv 2601.02663 — Cost- and Latency-Aware Benchmark).

Worked Examples

Example 1 — LLM tool with caching and surcharges.

cost:
  metered: true
  model: per_token
  currency: USD
  unit: 1M_input_tokens
  amount: 5.00
  cached_discount: 0.25
  surcharges:
    - name: long_context
      condition: "context > 200000"
      multiplier_input: 2.0
      multiplier_output: 1.5
    - name: data_residency_us
      multiplier_total: 1.10
  runtime_echo_path: $.usage
  budget_exhaustion:
    error_code: BUDGET_EXCEEDED
    response_status: 429

Example 2 — Web search tool with per-1000 unit pricing.

cost:
  metered: true
  model: per_unit
  currency: USD
  unit: 1000_searches
  amount: 10.00
  runtime_echo_path: $.usage.server_tool_use.web_search_requests
  budget_exhaustion:
    error_code: BUDGET_EXCEEDED

Example 3 — Geocoding API with per-call pricing.

cost:
  metered: true
  model: per_call
  currency: USD
  unit: 1_call
  amount: 0.005
  runtime_echo_path: $.metadata.billed_units
  budget_exhaustion:
    error_code: BUDGET_EXCEEDED

Example 4 — Tiered request gateway.

cost:
  metered: true
  model: tiered
  currency: USD
  unit: 1M_requests
  tiers:
    - up_to: 333000000
      amount: 3.50
    - up_to: null
      amount: 2.80
  runtime_echo_path: $.usage.requests

Example 5 — Free tool.

cost:
  metered: false
  model: per_call
  currency: requests
  unit: 1_call
  amount: 0
  runtime_echo_path: $.usage

Common Mistakes

Quoting prices only on a marketing page. Humans read pricing pages; agents read manifests. Both must agree, but the manifest is the contract.
Hiding surcharges in fine print. Long-context, data-residency, and storage surcharges materially change planner decisions. Disclose them in cost.surcharges with conditions.
Using RATE_LIMIT_EXCEEDED for budget exhaustion. They are different. A planner that retries-on-rate-limit will burn more money on a budget breach.
Reporting estimated cost only. Echo observed usage. The planner reconciles estimate vs actual to detect drift and flag billing surprises.
Omitting cached vs uncached breakdown. Caching can reduce cost by 75% on repeat content; planners need both numbers to choose between caching and non-caching modes.
Treating cost disclosure as optional for free tools. Even free tools have rate budgets. Set cost.metered: false and link to the rate limit block.

FAQ

Q: Does this spec require dollar-denominated pricing?

No. Tools that meter in tokens, requests, or other units can declare those directly with currency: tokens or currency: requests. Cost-aware planners convert across units when an exchange rate is known. When it is not, planners track each currency independently.

Q: How does this interact with subscription pricing?

Declare the subscription as cost.model: subscription with the included quota as tiers. Once the included quota is exhausted, the next tier kicks in. Subscription-only tools that have no overage quota set cost.amount: 0 and rely on rate limits to enforce ceiling.

Q: What about tools that reveal cost only after execution?

Publish the worst-case rate in the manifest and echo the actual cost in the response. Planners use the worst case for budgeting and reconcile downward when actual usage is lower. This pattern matches how LLM token billing already works in practice.

Q: Should the manifest cost be authoritative or should the live API be?

The live API is authoritative for billing; the manifest is authoritative for planning. The two MUST stay consistent within publisher-defined update windows, and changes to cost SHOULD trigger a MINOR version bump per the agent versioning spec. Drift between manifest and live billing is grounds for a BUDGET_EXCEEDED-equivalent client-side abort.

Q: How granular should tier breakpoints be?

Match the granularity used in your published pricing page. AWS API Gateway, OpenAI, and Anthropic all expose simple two- or three-tier structures. Avoid more than five tiers; planners struggle to compute optimal call placement against high tier counts.

Q: What if my tool uses external providers I do not control the price of?

Declare a cost.amount representing your markup or pass-through, and add a cost.surcharges entry pointing to the upstream provider with pass_through: true. Echo upstream usage in the runtime block so the planner can see the real cost composition.

Q: Does this spec cover compute resource costs (CPU, GPU, RAM)?

If the publisher meters them, yes. Use currency: USD (or the appropriate unit) and pick a unit that reflects what is billed. If compute cost is bundled into a per-call price, no separate disclosure is required.

Q: How should a planner cap spend across many tools?

Maintain a ledger keyed by currency (USD, tokens, requests). Subtract observed cost from the runtime echo on every successful call. Halt on BUDGET_EXCEEDED. The Budget Tracker module described in BATS provides a reference implementation pattern.