Agent MCP Server Discovery Specification

AI agents discover MCP servers through four vectors — local config file, registry API, well-known URL (/.well-known/mcp.json), and user-pasted endpoint — then complete a capability handshake and verify trust (manifest signature or origin pin) before invoking any tool.

TL;DR

Agents support four discovery vectors: local config file, registry API, well-known URL (/.well-known/mcp.json), and user-pasted endpoint.
Every connection MUST complete a capability handshake (initialize → server.capabilities → version negotiation) before any tool invocation.
Trust verification uses manifest signing or origin pinning; unsigned manifests MUST be quarantined behind explicit user consent.
Manifests MUST be cached with a max TTL and invalidated on capability-handshake mismatch.

Definition

Agent MCP server discovery is the protocol layer by which a Model Context Protocol (MCP) client — typically embedded in an AI agent runtime such as Claude Desktop, an OpenAI Agents SDK app, or a custom orchestrator — locates an MCP server, fetches its manifest, negotiates a protocol version, and verifies its trustworthiness before using any tool the server exposes. The Model Context Protocol itself defines the wire format (JSON-RPC 2.0 over stdio or Streamable HTTP); the discovery layer turns a hostname or local binary into a usable, security-checked endpoint (Model Context Protocol spec, 2025).

Discovery is distinct from tool discovery. MCP server discovery answers "which servers are available and reachable?" Tool discovery answers "which tools does this server expose?" The two layers stack: an agent first discovers a server, then enumerates its tools through the same connection. Conflating them leads to brittle architectures where adding a tool requires an agent restart.

Why this matters

The practical value of a discovery spec is that it lets agent ecosystems expand without recompiling clients. A team can ship a new MCP server, register it via /.well-known/mcp.json or a registry API, and have compliant agents pick it up at the next discovery cycle. Without a discovery spec, every new server requires a hard-coded entry in every client.

The security value is equally important. MCP servers can read sensitive resources and execute actions on behalf of the user. A discovery spec that mandates capability handshake, version negotiation, and trust verification keeps an agent from invoking a tool whose semantics the agent has not validated. Without these checks, an agent that successfully connects to a server it does not understand can be tricked into executing destructive tool calls.

Finally, discovery affects the user experience. The four vectors map to four user intents: "my organization preconfigured this" (config file), "I picked this from a marketplace" (registry API), "this site advertises an MCP endpoint" (well-known URL), and "a colleague gave me this URL" (user-paste). A spec that supports all four lets the agent behave correctly at each touchpoint.

How it works

The discovery flow is a sequence of four phases: locate, fetch, handshake, and verify.

flowchart LR
    A["Agent runtime"] --> B["Locate (4 vectors)"]
    B --> B1["Config file"]
    B --> B2["Registry API"]
    B --> B3["/.well-known/mcp.json"]
    B --> B4["User-paste endpoint"]
    B1 --> C["Fetch manifest"]
    B2 --> C
    B3 --> C
    B4 --> C
    C --> D["initialize"]
    D --> E["server.capabilities"]
    E --> F["version negotiation"]
    F --> G{"Trust check"}
    G -- "signed or pinned" --> H["Connection ready"]
    G -- "unsigned" --> I["User consent gate"]
    I -- "approve" --> H
    I -- "deny" --> J["Reject"]

Phase 1: locate

The agent identifies candidate MCP servers from one or more of the four discovery vectors. The config file vector reads a local JSON file (commonly ~/.config//mcp.json) listing preconfigured server endpoints. The registry API vector queries an HTTP registry that returns a list of servers and their metadata. The well-known URL vector fetches /.well-known/mcp.json from a host (per RFC 8615 well-known URI semantics) to learn whether that origin advertises an MCP endpoint (RFC 8615). The user-paste vector accepts a manually entered endpoint URL or local command.

Phase 2: fetch manifest

For each candidate, the agent fetches the server manifest. For HTTP-transport servers the manifest is fetched over HTTPS; for stdio-transport servers the manifest is read from the server's initialize response. Manifests advertise the server's name, version, supported transports, supported capabilities, declared tool list (or a pointer to enumerate at runtime), and auth requirements.

Phase 3: capability handshake

The agent issues an initialize request that declares the agent's protocol version and supported client capabilities. The server replies with server.capabilities, declaring which optional features (sampling, resources, prompts, tools, completions) it supports. Both sides then negotiate the protocol version: the highest mutually supported version wins. If no compatible version exists, the client MUST refuse to proceed and surface the mismatch to the user (MCP authorization spec, 2025).

Phase 4: trust verification

Before the connection is marked usable, the agent verifies trust through one of two paths. Manifest signing validates a detached signature against a known publisher key; if the signature verifies, the server is trusted at the publisher's trust tier. Origin pinning binds the server identity to its TLS certificate and origin; subsequent connections to the same origin must present the same pinned cert chain. If neither verification succeeds, the agent SHOULD prompt the user for explicit consent and quarantine the server in a low-privilege mode where destructive tools are disabled.

Practical application

A reference client implementation follows seven steps:

Initialize the discovery cache with a TTL between 300 and 3,600 seconds for static manifests; invalidate immediately on handshake mismatch.
Enumerate vectors in priority order: config file (highest), registry API, well-known URL, user-paste (lowest), de-duplicating by canonical endpoint.
Fetch manifests in parallel with a per-request timeout (commonly 5-10 seconds) and exponential backoff on transient failures (1s, 2s, 4s, with jitter).
Validate manifest schema against the published JSON Schema before any further processing; reject malformed manifests outright.
Run the capability handshake for each surviving candidate; record the negotiated protocol version and capabilities in the connection record.
Verify trust via signature or origin pin; gate unsigned servers behind explicit user consent and apply a reduced-privilege scope.
Bind auth scopes per tool, not per server; the agent SHOULD support OAuth 2.1 bearer tokens with scope strings that the server publishes alongside each tool definition.

Failure semantics: a discovery cycle that fails at any phase is logged with the failing phase, the candidate endpoint, and the underlying error. Retries follow a circuit-breaker pattern — three consecutive failures move the endpoint into a five-minute cooldown before retry.

Comparison vs static tool registries

Static tool registries hard-code tool definitions in the agent itself. They are simpler to reason about and require no runtime discovery, but they cannot expand without a client release. MCP server discovery trades some startup latency and trust-verification complexity for the ability to extend the agent at runtime. For consumer agents (Claude Desktop, ChatGPT desktop) the runtime extensibility is essential; for narrow vertical agents that ship with a fixed toolset, a static registry may be sufficient. The two patterns can coexist: a static core registry handles the always-on toolset, and MCP discovery adds user-installed or workspace-installed servers on top.

Common mistakes

Trusting unsigned manifests. A common shortcut is to skip signature verification because most public servers do not yet sign their manifests. The correct path is to quarantine unsigned servers and gate destructive tools behind user consent; skipping verification entirely creates a path for prompt-injection-driven server impersonation.

No version pinning. Clients that assume the latest protocol version is always supported break against older servers. Pin the negotiated version on the connection and reject capability calls outside the agreed envelope.

Infinite manifest re-fetch. Without a TTL and circuit breaker, a misbehaving discovery loop hammers the well-known endpoint on every action. Always cap re-fetch frequency and add a cooldown after consecutive failures.

Leaking auth tokens. Bearer tokens MUST be scoped per server and never re-used across servers; a global agent token shared across MCP connections is a critical security bug.

FAQ

Q: How do agents authenticate to MCP servers?

The MCP authorization spec recommends OAuth 2.1 with bearer tokens for HTTP-transport servers and process-level identity for stdio servers (MCP authorization spec, 2025). Tokens are scoped per server, not per agent, and tools advertise their required scopes in the manifest so the client can request only what each tool needs. Re-using a single token across multiple servers is a critical security anti-pattern.

Q: What happens when an MCP server is offline?

The client SHOULD apply documented retry and circuit-breaker rules: three consecutive connection failures move the endpoint into a five-minute cooldown. Until cooldown expires, tools from that server are surfaced as unavailable rather than being silently skipped — the user needs to know which capability they have temporarily lost.

Q: Should agents auto-trust well-known MCP registries?

No. A well-known URL only proves that an origin advertises an MCP endpoint; it does not prove the publisher's identity. Auto-trust should be reserved for servers whose manifests are signed by a key the agent already recognizes, or whose origin has been explicitly pinned by the user or the deploying organization.

Q: How are MCP server manifests cached and invalidated?

A defensible default is to cache static manifests for 300-3,600 seconds and to invalidate immediately on capability-handshake mismatch (where the server returns capabilities that disagree with the cached manifest). Manifests fetched via /.well-known/ may use the response's Cache-Control header when available; if absent, fall back to the agent default.

Q: What is the difference between MCP server discovery and tool discovery?

Server discovery answers "which MCP servers can I reach?" by locating endpoints across the four vectors and validating them with a handshake. Tool discovery answers "which tools does this server expose?" by enumerating the server's tool list (often via tools/list) after the connection is established. Server discovery happens once per connection cycle; tool discovery can happen multiple times within a single connection.

Q: How does capability negotiation handle version mismatches?

The initialize exchange returns the highest mutually supported protocol version. If the agent's minimum and the server's maximum do not overlap, the client MUST refuse to proceed and surface a clear error to the user rather than attempting a best-effort downgrade. Silent downgrades hide protocol drift and produce hard-to-debug runtime failures.