Agent Startup and Shutdown Specification
An agent startup and shutdown specification defines five lifecycle phases (init, warmup, ready, drain, terminate) and the readiness probes, signal handlers, and idempotency rules that keep in-flight requests safe across deploys. Without it, rolling updates kill long-running conversations mid-flight and force users to retry from scratch.
TL;DR
Agents are long-running, stateful, and slow to warm up, so naive process termination drops conversations mid-flight. This spec defines a five-phase lifecycle (init, warmup, ready, drain, terminate), a readiness contract that flips the agent unready before SIGTERM is delivered, an in-flight drain deadline that lets active turns complete, and an idempotency rule that prevents double execution on restart. The contract is enforceable on Kubernetes, ECS, Nomad, and bare metal.
Definition
An agent startup and shutdown specification is a contract between the agent process and its orchestrator (Kubernetes, ECS, a process manager, or a custom supervisor) that guarantees:
- the agent does not accept new work until external dependencies (LLM provider, vector store, tool servers, memory store) are reachable and warmed,
- the agent finishes or safely checkpoints in-flight work when shutdown is signaled, within a bounded grace period,
- restart of a replacement instance does not double-execute side effects.
The spec is the integration surface between an agent runtime and a deployment platform; it is independent of the model or framework in use.
Why this matters
Agent traffic differs from typical web traffic in three reliability-critical ways:
- Long turn duration. Agentic deep-research and multi-tool workflows can take 10-30+ minutes per turn (Reddit r/devops, 2025). Default container stopTimeout of 30-120 seconds is far below this, so deploys interrupt active turns.
- Expensive cold start. Loading a model client, embedding model, vector index, and tool descriptors can take 5-60 seconds. A pod that accepts traffic during this window returns errors and skews load-balancer health.
- Side-effectful tool calls. Tools that send email, post to APIs, or write to databases can double-fire if a turn is killed and retried by the caller. Idempotency must be wired into the lifecycle, not bolted on later.
The Kubernetes default termination flow — mark pod terminating, remove endpoint, send SIGTERM, wait 30 seconds, send SIGKILL — is graceful only if the application cooperates (Yash Batra, 2026). A spec is how cooperation gets standardized across an agent fleet.
Lifecycle phases
Five phases, in order. Each transition is observable on a status endpoint and emits a structured event.
| Phase | Entry trigger | Behavior | Readiness | Exit |
|---|---|---|---|---|
| init | Process start | Load config, allocate caches, open dependency clients | Not ready | Dependencies confirmed reachable |
| warmup | Init complete | Prefetch tool descriptors, run a cheap LLM ping, prime caches | Not ready | Warmup probe passes |
| ready | Warmup complete | Accept new turns, serve traffic | Ready | SIGTERM received |
| drain | SIGTERM received | Reject new turns, finish or checkpoint in-flight turns | Not ready | All turns settled or deadline hit |
| terminate | Drain complete or deadline | Flush logs, close connections, exit 0 | Not ready | Process exits |
The probe contract is split: liveness covers init through terminate, readiness covers only ready.
Signal handling
The spec mandates handlers for two signals on POSIX platforms:
- SIGTERM — enter drain. Set the readiness probe to fail. Refuse new turn submissions with HTTP 503 or the framework-equivalent. Allow in-flight turns to continue up to drain_deadline_seconds. After the deadline, checkpoint pending state per the agent checkpoint spec and exit 0.
- SIGINT — same handling as SIGTERM for development parity. Production should not rely on SIGINT.
SIGKILL cannot be caught and indicates the platform exceeded the grace period. Treat any SIGKILL as an incident.
Frameworks that install their own signal handlers (the OpenAI Agents SDK and several LangChain runners do, see openai/openai-agents-js#175 and livekit/agents-js#275) must expose a hook to delegate to the platform-aware handler, or the spec cannot be honored.
Drain deadline and in-flight policy
drain_deadline_seconds is a required configuration field. Its value depends on the longest legitimate turn:
- Conversational chat agent: 60-120 seconds.
- Tool-using support agent: 120-300 seconds.
- Deep-research or long-task agent: 1,800+ seconds, but only feasible with a long termination grace period (Kubernetes terminationGracePeriodSeconds, ECS stopTimeout).
When the deadline approaches without a turn completing, the agent must:
- Persist a checkpoint covering trace, partial tool results, pending tool calls, and memory deltas.
- Emit a turn_checkpointed event with the checkpoint URL or ID.
- Notify the user surface with a recoverable error containing a resume token.
- Exit with status 0 once all in-flight turns are checkpointed.
A later instance can resume from the checkpoint. Without checkpointing, mid-flight kills are visible to users as broken sessions — the failure mode reported in the LiveKit Agents and ECS field reports cited above.
Idempotency on restart
Every side-effectful tool call must be issued with an idempotency key derived from (turn_id, tool_call_id). The spec requires:
- The tool client library writes the key to the request before the call.
- The tool server deduplicates by key for at least the drain deadline window.
- The agent persists the key alongside the pending tool call so a resumed instance reuses the same key rather than generating a new one.
Idempotency removes the worst failure mode of agent restarts — duplicate emails, duplicate payments, duplicate writes — without requiring distributed transactions.
Readiness contract on Kubernetes
On Kubernetes the spec maps to:
- startupProbe covers the init and warmup phases. Use a generous failureThreshold so cold start does not flap the pod.
- readinessProbe reflects the ready phase only. Once SIGTERM is received the probe must fail within one probe interval, so that the endpoint is removed from service and new traffic stops.
- preStop hook is reserved for last-resort cleanup. It should not be the place drain logic lives — drain belongs in the application's signal handler so it works on every platform, not only Kubernetes (Kubernetes docs, container-lifecycle-hooks).
- terminationGracePeriodSeconds must be greater than drain_deadline_seconds plus a 10-second buffer. Equality causes SIGKILL to interrupt checkpointing.
On ECS the equivalents are essential: true, the task definition stopTimeout (max 120 s on Fargate, 7,200 s on EC2), and the application load balancer deregistration delay. The contract is the same; only the names differ.
Common pitfalls
- Treating warmup and ready as the same phase. Accepting traffic before tool descriptors load returns hallucinated tool errors during the first 10 seconds of every deploy.
- Drain logic in preStop only. Works on Kubernetes, breaks on ECS, breaks in local dev. Put drain in the signal handler.
- No drain deadline. A stuck turn pins the pod indefinitely; eventually the orchestrator SIGKILLs and you lose the checkpoint.
- Default 30-second grace period. Agent turns routinely exceed this. Always raise terminationGracePeriodSeconds to match the workload.
- Framework-installed signal handlers swallowing SIGTERM. Document the override hook in the runtime and verify with an integration test that SIGTERM reaches the spec handler.
- Idempotency only at the agent layer. If the tool server does not deduplicate, retried tool calls double-fire on restart. Make the deduplication contract explicit.
FAQ
Q: How long should terminationGracePeriodSeconds be for an agent?
It must exceed drain_deadline_seconds by at least 10 seconds. For chat agents, 90-150 seconds is typical. For long-running research agents, 30 minutes or more, paired with checkpointing so the platform does not pin nodes.
Q: Do I still need preStop if I handle SIGTERM in the app?
No, with one exception: if the application's signal handler cannot remove the pod from service quickly enough, a small preStop sleep (5-10 seconds) gives the endpoint a chance to be deregistered before traffic stops being routed. Otherwise, drain belongs in the signal handler.
Q: How does this spec interact with checkpointing?
The drain phase is the trigger that runs the checkpoint. The checkpoint payload and resume contract live in the agent checkpoint and resume spec. This spec only requires that drain completes the checkpoint within the grace period.
Q: What about Spot or preemptible instances with two-minute notice?
Treat the spot interruption notice as an early SIGTERM. Wire the cloud-specific notice (EC2 Spot interruption, GCP preemption signal) to the same drain handler. Pre-deregister from the load balancer before container SIGTERM to use the full two minutes for draining (AWS Containers Blog, Graceful shutdowns with ECS).
Q: How do I test this spec?
Two integration tests, run on every release: (1) start a long-running turn, send SIGTERM, assert the turn completes or checkpoints within the deadline; (2) trigger an unhealthy dependency at boot, assert the readiness probe fails and traffic is not routed.
Q: Does this apply to serverless agents (Lambda, Cloud Run)?
Partially. Init and warmup map to provisioned concurrency or min-instances. Drain is constrained by the platform's max execution time, so long turns must use external orchestration (Step Functions, Workflows) and the agent itself is a stateless turn worker. The idempotency rule still applies.
Related Articles
Agent Circuit Breaker Specification
Specification for circuit breakers protecting AI agent calls to LLM providers and tools, including state transitions, threshold tuning, fallback strategies, and observability hooks.
Agent Graceful Degradation Specification
Specification for graceful degradation when AI agent dependencies fail: model fallback chains, tool-skip policies, cached-response serving, and user-facing failure messaging.
Agent Health Check Specification
Specification for liveness, readiness, and startup probes in production AI agents, including LLM-provider ping patterns, dependency probing, and degraded-mode signaling.