Agent Memory Architecture for Long-Running Documentation Agents
Long-running documentation agents need a four-layer memory architecture (working, episodic, semantic, procedural) anchored to a canonical fact store so citations remain stable across runs and refresh cycles.
TL;DR: A documentation agent that publishes, audits, or rewrites content over weeks must remember what it published, why, against which fact, and through which procedure. Without a layered memory the agent re-introduces contradictions, loses canonical entities, and produces citations that drift over time. This spec defines four memory layers and the integration points each one needs to a canonical fact store.
Why generic chatbot memory designs fail for docs agents
Most agent-memory guidance is written for chatbots and assistants where the success metric is conversational coherence over a few turns. A documentation agent has different constraints:
- It runs over days or weeks, not minutes.
- Its outputs are public, structured, and citation-bearing.
- Its facts must remain identical across runs unless the underlying source changed.
- Its decisions must be auditable months later.
A single embedding store with a recency window will not satisfy any of these constraints. The agent needs distinct memory layers, each with a different write policy, retention policy, and verification gate.
The four-layer memory model
Layer 1: Working memory
Scope: a single run.
Contents: the current task, the row being processed, intermediate reasoning, tool outputs, draft content.
Write policy: ephemeral, dropped at run end.
Integration: read from the entity model and fact store at run start; write the working artifacts only into the next layer if they survive QA.
Common failure: leaking working-memory artifacts into long-term memory. A draft that fails QA must not pollute semantic memory.
Layer 2: Episodic memory
Scope: per-run records, indexed by time and action.
Contents: what the agent did, against which row, with which inputs and outputs, and the final status.
Write policy: append-only.
Integration: feeds the audit log; supports replay and dispute resolution.
Common failure: storing prose summaries instead of structured records. Episodic memory is for re-derivation; keep it in machine-readable form.
Layer 3: Semantic memory
Scope: long-term, global facts about the world the agent operates in.
Contents: canonical entities, canonical_concept_ids, verified facts, source URLs, last-verified dates, relations.
Write policy: gated. Only writes that pass verification (source check, contradiction check, freshness check) reach this layer.
Integration: this is the canonical fact store. Working memory reads from it; episodic memory references its versions.
Common failure: writing unverified beliefs. A claim that did not pass the gate belongs in episodic memory as a hypothesis, not in semantic memory as a fact.
Layer 4: Procedural memory
Scope: long-term, learned procedures and policies.
Contents: workflow templates, prompt patterns, retry policies, refresh cadences, failure recoveries.
Write policy: human-in-the-loop or supervised; updates require explicit promotion.
Integration: read at run start; not written by the agent's autonomous loop.
Common failure: silently mutating procedures based on a single run's outcome. Procedural change is editorial, not operational.
Memory write gates
A write to long-term memory (semantic or procedural) must pass three gates:
- Source gate. Every fact carries a source URL and a last-verified date; no source, no write.
- Contradiction gate. A new fact that contradicts an existing canonical fact triggers a reconciliation step before either entry is updated.
- Freshness gate. If a fact is older than its content_type's refresh threshold, it is rewritten as conditional language until reverified.
The gates are deterministic. They run as part of the agent's tool stack, not as advisory comments.
Integration with the canonical fact store
Semantic memory is operationalized as a canonical fact store keyed by canonical_concept_id and fact_id. Each fact carries:
- value
- units (where applicable)
- source_url
- last_verified_at
- confidence_band
- supersedes (previous fact_id if updated)
The agent reads facts by id, never by free-text recall. Drafts that need a fact request it from the store; if missing, the agent enters a research subroutine and writes the result back through the gates.
Integration with citation telemetry
Long-running agents should observe citation outcomes for the content they produce. The episodic layer logs each publish event; a separate telemetry feed (citation rate, contradiction rate, refresh hit rate) is read at run start to prioritize work. Memory feeds back into prioritization without becoming a leaky signal: telemetry stays in episodic memory, not in semantic memory.
Sample memory schema
working_memory:
task_id: string
row_url: string
draft: string
tool_outputs: array
episodic_memory:
- run_id: string
started_at: datetime
actions: array
status: enum
citations_emitted: array
semantic_memory:
facts:
- fact_id: string
canonical_concept_id: string
value: any
units: string?
source_url: string
last_verified_at: date
confidence_band: enum
supersedes: string?
procedural_memory:
templates: array
retry_policies: object
refresh_cadences: objectImplementation pitfalls
- Single embedding store as memory. Embeddings support recall, not verification; pair with a typed fact store.
- No supersedes chain. Without versioning, you cannot audit why a fact changed.
- Treating procedural memory as autonomous. Procedure changes need editorial promotion to avoid drift.
- Skipping freshness gates on long-lived facts. A 200-day-old vendor capability is no longer canonical without revalidation.
- Logging only success. Failed attempts contain the most signal; episodic memory captures both.
FAQ
Q: Can I implement this with a single vector database?
You can implement working and episodic memory on top of a vector database, but semantic memory should be a typed fact store keyed by canonical_concept_id. Vector search is a recall mechanism, not a verification mechanism.
Q: Where do procedural-memory updates come from?
From editorial review. The agent surfaces candidate updates (a new pattern that succeeded across many runs) and a human promotes them into procedural memory.
Q: How do I prevent memory from growing unboundedly?
Apply retention policies per layer. Working memory is dropped at run end. Episodic memory is summarized after a configurable window with the raw log archived. Semantic memory grows but is bounded by the entity model.
Q: Does this design work without a fact store?
No. The semantic layer requires a canonical fact store; otherwise contradiction and freshness gates have nothing to check against.
Q: How does this interact with verified agent identity?
Verified identity authenticates the agent at the network layer. Memory architecture governs what the agent remembers and writes. The two are complementary: verified identity makes the agent's actions trustable; memory architecture makes them reproducible.
Related Articles
Agent Tool Use Documentation Specification
Specification for documenting tools so AI agents can discover, understand, and correctly invoke them: structured schemas, examples, error semantics, and idempotency hints.
Agent Trajectory Documentation Spec: Designing Replay-Ready Docs for Browser Agents
Specification for replay-ready browser agent trajectory documentation: step manifests, selectors, verification steps, and citation-friendly source mapping.
Citation Half-Life Refresh Cadence Framework: Platform-Specific Update Schedules for AI Search
Citation half-life refresh cadence framework with platform-specific update schedules for ChatGPT, Perplexity, Google AI Mode, and Gemini in 2026.