Agent State Management Patterns Specification

Q: Do I need a workflow engine like Temporal for every agent?

No. Short-running agents (single user turn, no multi-step external side effects) can use LangGraph with an in-memory or Redis checkpointer. Workflow engines pay off when runs span minutes to months, cross many tools, or require strong durability guarantees.

Q: Can I store all state in the LLM context window?

No. The context window is short-term context only — it is volatile and bounded by token limits. Working scratchpad, long-term memory, and durable workflow state must live outside the model.

Q: How often should the runtime checkpoint?

At minimum: before every externally-visible side effect, after every model step, and at every human-in-the-loop pause. More frequent is rarely harmful with append-only or LSM-style checkpoint stores.

Q: What about cross-thread memory?

Cross-thread memory is the long-term store, not the working scratchpad. Sharing scratchpad state across threads creates race conditions; long-term memory writes are explicit and serialized through the memory writer.

Agent state management is the discipline of choosing the right storage layer for each class of state — short-term, working, long-term, and durable execution — and committing checkpoints often enough that any agent run can be resumed exactly where it left off, even after a crash.

TL;DR

LLMs are stateless; agents are not. A production agent runtime must explicitly model four state classes — short-term context, working scratchpad, long-term memory, and durable workflow state — and back each by an appropriate storage layer. This spec defines the required state classes, storage backends, checkpoint contract, and recovery semantics every Geodocs-aligned agent platform must implement.

Scope

This specification covers what an agent runtime stores, where, and how it recovers state across crashes, restarts, and human-in-the-loop pauses. It is the persistence companion to Agent Error Recovery Patterns Specification. Cross-thread sharing, multi-agent coordination, and memory pruning policy are downstream concerns that build on this layer.

1. State Classes

Every agent runtime MUST distinguish at least four state classes. Conflating them leads to either expensive over-persistence or fatal under-persistence.

Class	Lifetime	Typical content	Read frequency	Latency budget
Short-term context	Current LLM call	Recent N messages, current tool results	Every step	<10 ms
Working scratchpad	Single agent run / thread	Plan, intermediate results, partial output	Every step	<50 ms
Long-term memory	User / tenant lifetime	Preferences, episodic facts, semantic notes	Per relevant query	<200 ms
Durable workflow state	Workflow lifetime (minutes-months)	Step status, signals, retry counters	Per workflow event	<500 ms

These classes map to the temporal scopes used in agent-memory literature and to LangGraph's distinction between thread-level and cross-thread state, made explicit in the LangGraph persistence docs.

2. Storage Backends

Class	Recommended primary	Acceptable alternatives	Avoid
Short-term context	In-process memory	Redis (when stateless workers)	SQL row-per-message
Working scratchpad	Redis / in-process	LangGraph InMemorySaver for prototypes	S3 / object stores
Long-term memory	Vector DB + SQL	DynamoDB, Postgres + pgvector	Append-only logs
Durable workflow state	Workflow engine (Temporal, LangGraph + checkpointer)	Postgres / DynamoDB / Redis as checkpoint backend	In-memory only

Reference architectures: the AWS DynamoDB + LangGraph guide, the Redis langgraph-checkpoint-redis integration, and Temporal's durable execution model are canonical and SHOULD be preferred over hand-rolled persistence.

3. Checkpoint Contract

A checkpoint is a snapshot of the agent's state at a specific point in execution. The runtime MUST emit checkpoints to durable storage at every transition between major steps. Each checkpoint MUST contain:

A unique, monotonically increasing ID (per the LangGraph Checkpoint API).
The thread / run identifier.
The serialized working scratchpad (plan, intermediate results).
The current step pointer (which node / activity is next).
The error history and retry counters.
A timestamp and the agent / model version.

Checkpoints MUST be written before any externally-visible side effect is taken. Pairing this with idempotency keys (see Agent Error Recovery Patterns Specification) gives crash-proof execution: on resume, the runtime replays from the last checkpoint, and idempotent tools collapse duplicate calls.

4. Recovery Semantics

On restart, the runtime MUST:

Locate the most recent checkpoint for the thread (latest row by monotonic ID, no full scan).
Verify the checkpoint matches the current agent / model version policy. If incompatible, route to manual review rather than auto-resume.
Rehydrate the working scratchpad and resume from the next step pointer.
Re-issue any in-flight tool call using its original idempotency key, allowing the target service to short-circuit duplicates.
Record a runtime.recovery span (see Agent Tracing and Spans Specification) with the checkpoint ID and gap duration.

The runtime MUST NOT auto-resume across breaking schema changes. Schema migrations require an explicit replay or compensation policy.

5. Long-Term Memory

Long-term memory persists across runs and threads. The runtime MUST implement at least:

Episodic store: append-only log of (timestamp, actor, event, summary) records, indexed by user / tenant.
Semantic store: vector index of distilled facts, preferences, and patterns derived from the episodic log. The A-MEM paper (Chen et al., 2025) is one principled approach; simpler RAG-over-history setups are acceptable for smaller systems.
Pruning policy: a documented retention window per record class, plus a redaction path for user-requested deletion.

Long-term memory writes MUST be explicit, not a side effect of every step. The agent (or a dedicated memory writer) decides what to remember; uncontrolled writes inflate cost and leak signal.

6. Multi-Tenancy and Isolation

State MUST be partitioned by tenant and by user. The runtime MUST:

Include tenant_id and user_id in every checkpoint and memory record.
Enforce tenant isolation at the storage layer (separate keyspaces, row-level security, or per-tenant tables) — not only at the application layer.
Encrypt sensitive fields at rest (PII, secrets, tool credentials).

7. Observability

For every state operation, the runtime SHOULD emit:

Counter: checkpoints written per minute, by thread.
Histogram: checkpoint write latency.
Counter: recoveries per minute, with recovery_reason.
Gauge: active threads, by state.

FAQ

Q: Do I need a workflow engine like Temporal for every agent?

No. Short-running agents (single user turn, no multi-step external side effects) can use LangGraph with an in-memory or Redis checkpointer. Workflow engines pay off when runs span minutes to months, cross many tools, or require strong durability guarantees.

Q: Can I store all state in the LLM context window?

No. The context window is short-term context only — it is volatile and bounded by token limits. Working scratchpad, long-term memory, and durable workflow state must live outside the model.

Q: How often should the runtime checkpoint?

At minimum: before every externally-visible side effect, after every model step, and at every human-in-the-loop pause. More frequent is rarely harmful with append-only or LSM-style checkpoint stores.

Q: What about cross-thread memory?

Cross-thread memory is the long-term store, not the working scratchpad. Sharing scratchpad state across threads creates race conditions; long-term memory writes are explicit and serialized through the memory writer.

Agent State Management Patterns Specification

TL;DR

Scope

1. State Classes

2. Storage Backends

3. Checkpoint Contract

4. Recovery Semantics

5. Long-Term Memory

6. Multi-Tenancy and Isolation

7. Observability

FAQ

Q: Do I need a workflow engine like Temporal for every agent?

Q: Can I store all state in the LLM context window?

Q: How often should the runtime checkpoint?

Q: What about cross-thread memory?

Bài viết liên quan

Agent Conversation Summarization: Triggers, Schema, and Retention

Agent Error Recovery Patterns Specification

Agent Evaluation Harness Documentation: How to Spec an Eval Suite for AI Agents

Thông tin GEO & AI Search