Agent Memory Pattern Specification: Short-Term, Long-Term, and Episodic

AI agent memory is layered, not monolithic. This specification defines four tiers — working, episodic, semantic, and procedural — plus the consolidation, scoring, eviction, and PII handling rules that turn a stateless LLM into a coherent assistant across many sessions.

TL;DR

Use working memory for the active turn, episodic memory for past sessions, semantic memory for distilled facts, and procedural memory for learned routines. Consolidate from working to long-term memory on session end. Score retrieval by recency, frequency, and relevance. Evict by relevance score and TTL. Redact PII at write time, not at read time.

Why an agent memory spec exists

A stateless LLM forgets every prior interaction. Bolting on "long-term memory" without structure creates retrieval that is noisy, leaks personal information, and contradicts itself across sessions. Cognitive-science-inspired tiering (working / episodic / semantic / procedural) provides a debuggable architecture that scales from a single user assistant to multi-tenant agents. Foundational research on generative agents and memory hierarchies (Park et al., 2023; MemGPT, 2023) maps directly onto these tiers (Park et al., 2023; MemGPT, 2023).

Memory tiers

Tier	Lifetime	Storage	Typical content
Working	Single turn	LLM context	Current user message, retrieved chunks, scratchpad
Episodic	Session to lifetime	Append-only event log	"On 2026-05-01 the user asked about pricing"
Semantic	Long-lived	Key-value store + vector index	"User prefers metric units"
Procedural	Long-lived	Versioned policy/playbook store	"Always confirm before deleting"

Keep the four tiers as separate stores. Do not pack episodic events into the same index as semantic facts; their retrieval shapes differ.

Working memory management

Working memory is whatever the LLM sees this turn. Manage it explicitly:

Budget. Reserve a fixed token share for system prompt, tools, retrieved context, and history.
Compression. When the conversation exceeds the budget, summarize older turns into an episodic record and drop the raw turns from working memory.
Pin. Allow specific facts to be pinned (e.g., "target audience is enterprise SREs") so summarization does not lose them.
Cache. Reuse cached prefixes for repeated tool descriptions and system instructions where the provider supports it (Anthropic prompt caching).

Episodic memory schema

{
  "episode_id": "ep_2026-05-03_8a1c",
  "principal": "user:alex@stelixx",
  "agent": "geodocs-writer",
  "started_at": "2026-05-03T08:00:00Z",
  "ended_at": "2026-05-03T08:14:00Z",
  "summary": "User asked for an outline of the AEO content checklist.",
  "events": [
    {"role": "user", "text": "..."},
    {"role": "agent", "tool": "notion.searchPages", "args": {"query": "AEO checklist"}}
  ],
  "derived_facts": ["prefers concise summaries"],
  "tags": ["aeo", "checklist"],
  "acl": ["principal:alex@stelixx"]
}

Episodes are immutable once closed. Retrieval over episodes uses both vector similarity (on summary) and structured filters (principal, tags, started_at).

Semantic memory schema

Semantic memory holds distilled, durable facts:

{
  "fact_id": "fact_8c3f",
  "principal": "user:alex@stelixx",
  "text": "Prefers metric units in technical writing.",
  "sources": ["ep_2026-05-01_22ab", "ep_2026-05-03_8a1c"],
  "confidence": 0.9,
  "evidence_count": 2,
  "created_at": "2026-05-03T08:14:00Z",
  "last_reinforced_at": "2026-05-03T08:14:00Z",
  "acl": ["principal:alex@stelixx"]
}

Facts are upserted by content hash. New supporting episodes increment evidence_count and last_reinforced_at instead of creating duplicates.

Procedural memory schema

Procedural memory is what the agent has learned to do:

{
  "procedure_id": "proc_5a02",
  "name": "refund_workflow",
  "version": "3",
  "steps": ["verify_order", "check_policy", "issue_refund", "notify_user"],
  "learned_from": ["ep_2026-04-12_a91c"],
  "updated_at": "2026-04-13T00:00:00Z",
  "deprecated": false
}

Version procedures explicitly. New procedures supersede old ones via the version field; the agent runtime selects the highest non-deprecated version.

Consolidation jobs

Consolidation moves raw events from working/episodic memory into structured semantic and procedural memory.

Trigger. Session end, idle timeout, or batch nightly run.
Inputs. All new episodes since last run.
Outputs. New or updated facts, new or updated procedures, deprecation flags for outdated facts.
Idempotency. Re-running on the same input must not create duplicates.
Provenance. Every derived fact links back to source episodes.

Consolidation must run with the principal's permissions, so it never derives facts from data the principal cannot read.

Retrieval scoring

Retrieval over long-term memory uses a composite score:

score = w_relevance * cosine(query_vec, mem_vec)

+ w_recency * recency_decay(mem.updated_at, now)

+ w_frequency * normalized(mem.evidence_count)

+ w_pinned * (1 if pinned else 0)

Defaults: relevance 0.6, recency 0.2, frequency 0.1, pinned 0.1 (tune per workload).
Recency uses an exponential decay; pick a half-life per tier (days for episodic, months for semantic).
Always filter by ACL before scoring; never let scoring leak across principals.

Eviction policy

Tier	Default TTL	Trigger
Working	End of turn	Always
Episodic	90-365 days	TTL or principal request
Semantic	Indefinite	Manual deprecation or contradiction
Procedural	Indefinite	Version supersession

Deletions from semantic memory must propagate to derived caches and vector indexes within the audit-retention window.

PII redaction at write time

Run a redaction pass on every episode before persistence.
Replace direct identifiers (emails, phone numbers, payment methods) with stable tokens (, ).
Store the mapping in a secure side-channel keyed by principal; only the principal's session can resolve tokens back to plaintext.
Periodically re-scan stored memories with newer redactors; delete any new PII discovered post-hoc.

Writing raw PII first and trying to redact at read time is an anti-pattern. It leaves PII in backups, indexes, and analytics.

Validation checklist

[ ] Working memory budget is enforced per turn.
[ ] Episodes are immutable and ACL-tagged.
[ ] Semantic facts deduplicate by content hash.
[ ] Procedures are versioned.
[ ] Consolidation is idempotent.
[ ] Retrieval applies ACL before scoring.
[ ] Eviction TTLs are honored.
[ ] PII is redacted at write time.

FAQ

Q: Where should I store working memory?

In the LLM context for the active turn, plus a small scratchpad in the runtime for tool intermediate results. Persisting working memory beyond the turn is a category error — promote it to episodic memory instead.

Q: Do I need a separate store for procedural memory?

Yes. Procedures need versioning, review, and rollback similar to code. Mixing them with semantic facts loses governance.

Q: How do I prevent the agent from over-remembering personal data?

Pair PII redaction with explicit user controls ("forget this"). On a forget request, mark the underlying episodes deprecated, re-run consolidation to remove derived facts, and confirm to the user with a deletion receipt.

Q: How big should an episode summary be?

Short enough to fit cheaply in working memory — typically 1-3 sentences. The full episode body lives in cold storage; the summary is the primary retrieval target.

Q: How do I handle contradictions across episodes?

Keep both, but raise the conflict during consolidation. Either resolve via recency, deprecate the older fact with a reason, or surface the contradiction for the user to confirm.