Agent Knowledge Base Integration: RAG, MCP, and Direct API Patterns

Agent knowledge base integration connects an AI agent to internal knowledge sources — vector-search RAG stores, MCP-server-backed KBs, or direct retrieval APIs — and stamps every retrieved chunk with provenance (source URL, retrieval timestamp) and access-control metadata so the agent can cite sources and respect per-user permissions.

TL;DR

Three integration patterns dominate: RAG vector store, MCP server, and direct retrieval API; each has a different latency, freshness, and ACL profile.
Choose RAG when corpora are large and semantic recall matters; MCP when the KB is owned by another team and tool reuse is critical; direct API when the KB has a strong query language already.
Provenance is non-negotiable — every retrieved chunk needs source_id, retrieved_at, and an ACL stamp so the agent can cite and the runtime can post-filter.
ACLs are best enforced at retrieval (filter the index) rather than after retrieval, because post-filtering leaks the existence of restricted documents through latency and result counts.

Definition

Agent knowledge base integration is the design contract between an AI agent and a knowledge source: how the agent issues a query, how the source returns results, and how every result is annotated with metadata the agent and the runtime can use downstream. Three integration patterns are in production today — RAG (vector search over an embedded corpus), MCP (Model Context Protocol server exposing the KB as tools and resources), and direct retrieval API (agent calls the KB's native query endpoint). The integration layer is responsible for normalizing all three into a common result envelope so the agent's prompt does not branch by source.

The contract has two pillars: the query interface (what the agent passes in) and the result envelope (what the agent gets back). The query interface determines what the agent can ask — natural-language semantic search, structured filters, or hybrid. The result envelope determines what the agent can do with the answer — cite it, surface it to the user, or pass it on to a downstream step.

Why this matters

Without a consistent integration layer, every new KB the agent talks to becomes a new prompt pattern, a new failure mode, and a new place for ACL leaks. Three concrete pains drive the spec:

First, provenance. Modern users expect agents to cite sources. If retrieval results don't carry a stable source_id and a public-facing URL, the agent has nothing to cite. Hallucinated citations are worse than no citations.

Second, freshness. Some KBs are seconds old (live ticket systems), others are days old (re-embedded weekly). The agent needs to know which is which to phrase its answer correctly — "according to last week's policy doc" vs. "as of just now." Without retrieved_at, the agent assumes everything is current.

Third, ACLs. Per-user permissions are the bedrock of any enterprise integration. A poorly designed retrieval layer either leaks restricted documents in result counts or returns them outright. Enforcing ACLs as a hard filter at the retrieval layer — not at the LLM prompt — is the only safe pattern.

How it works

The integration is a thin adapter layer between the agent and the underlying KB. The adapter normalizes three different transport types into one envelope:

flowchart LR
    A["Agent query"] --> B["KB router"]
    B --> C["RAG vector store"]
    B --> D["MCP server"]
    B --> E["Direct retrieval API"]
    C --> F["Normalized result envelope"]
    D --> F
    E --> F
    F --> G["Agent prompt context"]

The result envelope carries six fields per chunk: text (the content), source_id (stable identifier for the document), source_url (resolvable URL the agent can cite), retrieved_at (ISO-8601 timestamp), acl_stamp (the ACL principal the chunk was filtered against, e.g., the user's ID), and score (retrieval confidence in the 0-to-1 range).

RAG vector store. The agent's query is embedded with the same model used at index time, then a top-k similarity search returns the closest chunks. Best for large corpora where semantic recall is the primary need. ACL is enforced as a metadata filter on the vector index — the user's permitted document IDs are passed alongside the embedding query (Pinecone, 2024).

MCP server. The KB exposes tools/list and resources/list over the Model Context Protocol; the agent discovers them at startup and calls them like any other tool (Anthropic, 2024). Best when the KB is maintained by a different team — they own the tool, you own the agent. ACL is delegated to the server, which authenticates the calling principal.

Direct retrieval API. The agent calls the KB's native search endpoint directly (e.g., Elasticsearch DSL, GraphQL). Best when the KB has a strong query language and you want to leverage it (faceted filters, aggregations) without an embedding layer. ACL is enforced by the API itself based on auth headers.

Practical application

A 4-step pattern to add a new KB to an agent:

Choose the integration pattern. RAG for semantic recall over large, mostly-static corpora; MCP for cross-team tool reuse; direct API when the KB already has a strong native query interface.
Build the result-envelope adapter. Whatever the source returns, normalize it to the six-field envelope. This is the single most important integration step — every downstream prompt depends on it.
Wire ACLs at the source, not in the prompt. Pass the user's principal ID into the retrieval call as a metadata filter or auth header. Never rely on the LLM to "respect" ACLs — it will fail.
Stamp retrieved_at on every chunk and surface freshness in the prompt. A simple tag like [fresh: 2s ago] vs. [stale: 7 days] lets the agent phrase confidence correctly.

A minimal RAG adapter wraps a vector-store SDK and emits the normalized envelope; a minimal MCP adapter is a thin shim over the MCP client SDK (Anthropic, 2024). The point of the spec is that the agent prompt does not change when you swap the underlying source.

Common mistakes

Stale embeddings. RAG indexes drift behind the source-of-truth; without a re-embedding job, citations point to outdated chunks. Schedule re-embedding daily or on-write.
Missing ACL enforcement at retrieval. Putting "respect user permissions" in the system prompt is theatre — the model will leak. Enforce ACLs as a hard filter at the retrieval layer.
No provenance metadata. Without source_id and source_url, the agent cannot cite and the auditor cannot verify. Stamp provenance on every chunk, even when the underlying source doesn't return it natively (synthesize it from the index key).
Mixing freshness contracts. When one source is live and another is week-old, the agent will treat them as equivalent unless you surface retrieved_at in the prompt context.

FAQ

Q: When should I use RAG vs. MCP server vs. direct API?

Use RAG when the KB is large, mostly-static, and the queries are semantic — embedding-based recall is the whole point. Use an MCP server when the KB is owned by a different team and you want to discover its tools dynamically without hard-coding integrations (Anthropic, 2024). Use a direct retrieval API when the KB already has a strong query language (Elasticsearch DSL, GraphQL) and you want exact filters, aggregations, or facets that embeddings can't express well (LangChain, 2024).

Q: How do I enforce per-user ACLs?

Always at the retrieval layer, never at the prompt. For RAG, pass the user's permitted document IDs as a metadata filter on the vector query (Pinecone, 2024). For MCP, delegate to the server's authentication and pass the user's identity in the request envelope. For direct APIs, use the API's native auth. Post-filtering at the LLM is unsafe — the model can be tricked into revealing the existence of filtered results.

Q: How is freshness signaled to the agent?

Stamp retrieved_at on every chunk and surface it in the prompt context (e.g., [retrieved: 2s ago]). For sources with explicit document timestamps, include source_modified_at separately. Agents that reason about freshness can then phrase confidence correctly — "as of [timestamp]" rather than asserting facts as currently true (OpenAI, 2024).

Agent Knowledge Base Integration: RAG, MCP, and Direct API Patterns

TL;DR

Definition

Why this matters

How it works

Practical application

Common mistakes

FAQ

Q: When should I use RAG vs. MCP server vs. direct API?

Q: How do I enforce per-user ACLs?

Q: How is freshness signaled to the agent?

Related Articles

Agent Tool Result Caching Spec: Keys, TTL, Invalidation

MCP Server Design for Content Publishers and Docs Teams

MCP Server Onboarding Checklist

GEO & AI Search Insights