Agent Knowledge Base Specification: Structure, Refresh, and Versioning

An agent knowledge base is the curated retrieval corpus that grounds responses. This specification defines the document model, chunking strategies, metadata schema, refresh cadence, version pinning, and rollback procedures so an agent can answer reliably and recover from bad ingests.

TL;DR

Model each document as an immutable versioned record with chunk children. Pick one chunking strategy per content type: fixed for short, semantic for prose, hierarchical for structured docs. Enrich every chunk with source URL, version, ACL, and freshness. Refresh on a cadence matched to source volatility, pin a known-good version per agent, and keep an audit trail so any answer can be traced back to a specific KB revision.

Why a knowledge base spec exists

An agent that grounds on an unstructured pile of files cannot be debugged. Without versioning, you cannot reproduce why an agent answered the way it did yesterday. Without refresh discipline, the agent quietly drifts off the source of truth. Without rollback, a bad ingest poisons every conversation. This specification fixes those failure modes.

Document model

A knowledge base contains documents. Each document is a versioned record of a real-world artifact (a help-doc page, a product spec, a runbook). Documents own one or more chunks that retrieval returns to the agent.

Document record

{
  "document_id": "kb_doc_94c1",
  "source_uri": "https://help.example.com/articles/refunds",
  "source_type": "web",
  "version": "2026-05-03T08:00:00Z",
  "checksum": "sha256:abc...",
  "title": "How refunds work",
  "language": "en",
  "acl": ["role:customer", "role:agent"],
  "tags": ["billing", "policy"],
  "published_at": "2026-04-30T00:00:00Z",
  "deprecated": false,
  "deprecated_at": null,
  "deprecated_reason": null
}

Chunk record

{
  "chunk_id": "kb_doc_94c1#chunk_07",
  "document_id": "kb_doc_94c1",
  "document_version": "2026-05-03T08:00:00Z",
  "text": "Refunds are issued to the original payment method within 5-10 business days...",
  "position": 7,
  "heading_path": ["Refunds", "Timing"],
  "vector": null,
  "metadata": {
    "source_uri": "https://help.example.com/articles/refunds#timing",
    "updated_at": "2026-05-03T08:00:00Z",
    "acl": ["role:customer"],
    "freshness_window_days": 30
  }
}

Chunking strategies

Strategy	When to use	Typical size	Overlap
Fixed	Short, uniform content (FAQs, snippets)	200-400 tokens	0-20%
Semantic	Long-form prose where boundaries matter	300-800 tokens	10-20%
Hierarchical	Docs with clear headings (specs, runbooks)	parent + child levels	parent always returned with child
Sentence-window	Q&A and chat logs	1-3 sentences + window	window 2-6 sentences

Chunk size and overlap influence both retrieval recall and context-window cost. Recent Anthropic research on contextual retrieval reports meaningful recall improvements from prepending a short generated context blurb to each chunk, especially for long documents (Anthropic, 2024 — Contextual Retrieval). Treat exact percentage gains as workload-dependent.

Default rule. Fixed chunking for utility content, semantic for marketing pages, hierarchical for technical specs. Pick one strategy per source type and document the choice in the source registry.

Metadata enrichment

Every chunk must carry the metadata the agent needs to (a) cite, (b) filter by ACL, (c) reason about freshness, and (d) recover provenance.

source_uri — deep link, ideally with a fragment to the section.
document_version — ISO timestamp or content hash of the source.
acl — list of role or tenant identifiers required for access.
updated_at — source change time, not ingest time.
freshness_window_days — how long the chunk is considered current.
deprecated — set true to soft-delete a chunk without removing it from history.

Refresh cadence

Source volatility	Cadence	Trigger
Static reference (legal text, glossary)	Quarterly	Manual review
Slow-moving docs	Weekly batch	Source webhook or daily diff
Operational runbooks	Daily	CI publish hook
Product catalog	Hourly	CDC stream
Live conversations / tickets	Near real-time	Event stream

Ingestion runs must be idempotent. Re-running a refresh on the same source version must produce the same chunks and the same chunk IDs.

Version pinning

Agents pin to a release tag of the knowledge base, not to "latest". A release tag is a labelled snapshot of all document versions, chunks, and embedding vectors at a point in time.

Production agents pin to a vetted release.
Canary agents pin to a candidate release for a fraction of traffic.
Eval agents pin to known-good releases used by regression tests.

Swapping the live release should be a single config change, not a re-ingest.

Rollback procedure

Detect a bad ingest via eval regression, traffic anomaly, or operator report.
Move the production agent's release pin back to the previous known-good tag.
Mark the bad release as quarantined; do not delete it (auditors need the trail).
File a postmortem with the source diff that caused the regression.
Re-ingest with a fix and create a new release tag; never reuse the bad tag's name.

Source-of-truth governance

Every chunk traces to exactly one document.
Every document traces to exactly one source artifact.
When two sources disagree, the document model records both versions and a canonical: true flag identifies the chosen one.
Agents must never paraphrase across two conflicting documents in the same answer without explicit conflict handling.

Conflict resolution

When retrieval returns chunks from sources that disagree:

Prefer the chunk with the higher-precedence source (configured per domain).
If precedence ties, prefer the more recent updated_at.
If still tied, surface both to the agent and require an explicit citation in the response.
Log the conflict so reviewers can update precedence rules.

Audit trail

For every retrieved answer, persist:

The agent run ID and prompt hash.
The release tag of the KB that was queried.
The list of chunk IDs and versions that were returned.
The final response with inline references back to those chunk IDs.

Audit records are append-only and retained for at least the regulatory window your domain requires.

Validation checklist

[ ] Every document has a stable document_id and a versioned record.
[ ] Every chunk references its parent document and version.
[ ] ACL fields propagate from document to chunk.
[ ] Refresh jobs are idempotent.
[ ] Release tags are immutable.
[ ] Rollback runbook is tested at least quarterly.
[ ] Audit trail captures release tag + chunk IDs per response.

FAQ

Q: Should I store the embedding inside the chunk record or in the vector store?

Keep the canonical chunk in the document store and write embeddings to the vector store keyed by chunk_id. That way you can re-embed without losing the source of truth.

Q: How big should a chunk be?

Usually 200-800 tokens. Smaller chunks improve precision; larger chunks preserve context. Test both ends of the range with your eval set (Chroma research on chunking).

Q: How do I version a knowledge base across embedding model changes?

Treat an embedding model change as a new release of the KB. Re-embed every chunk and create a new release tag; do not mix embeddings from two models in the same retrieval index.

Q: How do I delete sensitive content?

Mark the document as deprecated: true and re-ingest with the redaction. Keep the original record in cold storage if your policy requires audit access; never silently overwrite history.

Q: What goes in the metadata vs the chunk text?

Facts live in text; routing signals live in metadata. ACL, source URL, and freshness are metadata. Section headings can live in both — in metadata for filtering, in text for the LLM to read.