Agent Sandbox Documentation Specification: Documenting Execution Environments for Autonomous AI Agents
This specification defines the required and recommended fields a publisher must include when documenting an AI agent sandbox — identity, isolation tier, filesystem, network policy, resources, lifecycle, snapshots, tool surface, audit, and threat model — so downstream agents and operators can use them safely. Aligned with the Kubernetes SIG Agent Sandbox project, GKE Agent Sandbox CRDs, and OpenAI Sandbox Agents.
Status: Draft 1.0. This spec is doc-first: it does not prescribe how to build a sandbox; it prescribes what publishers must document about a sandbox so an agent or operator can decide whether and how to use it.
TL;DR
- A sandbox doc must answer five questions: what runs, where it runs, what it can touch, what it cannot touch, and how it is observed.
- The Kubernetes SIG Agent Sandbox project (Sandbox, SandboxTemplate, SandboxClaim/SandboxWarmPool CRDs) and OpenAI's Sandbox Agents define the infrastructure shape; this spec defines the documentation shape that surfaces that infrastructure to autonomous agents.
- Publishers should document at minimum: isolation tier, filesystem layout, exposed ports, mounted data, allowed egress, resource limits, snapshot/lifecycle behavior, tool surface, audit guarantees, and the threat model boundary.
Why a documentation spec
Kubernetes maintainers Janet Kuo and Justin Santa Barbara note that AI agents are stateful, mostly idle, and run for anywhere from 50 ms to weeks — a workload pattern existing primitives (Deployments, StatefulSets) do not fit cleanly. The SIG Apps Agent Sandbox project introduces a Sandbox CRD with stable identity, persistent storage, and pause/resume lifecycle; GKE adds Sandbox, SandboxTemplate, and SandboxClaim CRDs. OpenAI ships Sandbox Agents in the Python Agents SDK with files, commands, packages, ports, snapshots, and resumable state.
Platforms (Modal, Daytona, Docker, E2B, Firecrawl, Blaxel) compete on cold-start latency — Blaxel cites Firecracker microVMs at roughly 100-125 ms versus traditional serverless at 300 ms-several seconds. NVIDIA's AI Red Team and ARMO both highlight indirect prompt injection as the dominant threat that makes hard-isolation boundaries non-optional.
What is missing across these efforts is a publisher-side documentation contract. Without one, downstream agents cannot reason about whether a sandbox is fit-for-purpose for a given task. This spec fills that gap.
Audience and applicability
This spec applies to anyone publishing a sandbox that an autonomous agent (or another agent acting on a user's behalf) will request, claim, or operate. That includes:
- Internal platform teams exposing a sandbox via an MCP server.
- SaaS sandbox providers (Modal, E2B, Daytona, Blaxel, Firecrawl, GKE).
- Open-source projects shipping a SandboxTemplate for community use.
It does not apply to ad-hoc local sandboxes (devcontainers, bubblewrap scripts) unless they are exposed to autonomous agents.
Required fields
Every sandbox documentation page MUST include the following sections. Field names are recommended JSON-LD/YAML keys; surface labels can vary.
1. Identity (identity)
- name — human-readable sandbox or template name.
- id — stable, machine-readable identifier (URI-safe).
- version — semver of the sandbox or template.
- provider — publishing organization (with verified-agent-identity link if applicable).
- runtime_class — the underlying runtime (e.g., gvisor, kata, firecracker, docker, nodeOS).
2. Isolation tier (isolation)
Document the boundary explicitly. Use the four tiers below; do not invent your own.
| Tier | Boundary | Typical backing | Suitable for |
|---|---|---|---|
| T1 | Same-host, OS-level | bubblewrap, devcontainer | Low-risk, single-user dev |
| T2 | Container, shared kernel | Docker, OCI runc | Low-to-medium risk; non-untrusted code |
| T3 | User-space kernel / lightweight VM | gVisor, Kata Containers | Medium risk; untrusted code with shared host |
| T4 | Hardware-isolated VM / microVM | Firecracker, full KVM | High risk; arbitrary execution, customer data, secrets |
Document: tier, runtime_class, shared_kernel (boolean), multi_tenant (boolean), attestation_supported (boolean).
3. Filesystem (filesystem)
- root — the working directory the agent sees on entry.
- writable_paths — list of paths the agent may write.
- read_only_paths — list of paths visible but not writable.
- mounted_data — named mounts (data rooms, source directories), including their provenance and retention policy.
- persistence — one of ephemeral, session, persistent, snapshot.
- max_disk — hard quota.
4. Network policy (network)
- default — must be deny (zero-trust default) per the Northflank and Firecrawl guidance.
- egress_allowlist — explicit allowlist of hosts, ports, and protocols.
- ingress — exposed ports and the auth model on each.
- dns_policy — allowed resolvers and any DNS allowlist.
- bandwidth_limit — outbound rate cap.
5. Resources and limits (resources)
- cpu — millicores or shares; hard cap.
- memory — hard cap (oom_kill on overrun).
- gpu — type, count, and sharing model.
- disk_io — IOPS and bandwidth caps.
- process_limit — max PIDs.
- timeouts — layered: per_tool_call, per_task_loop, per_sandbox_lifetime. Firecrawl's guidance treats all three as required.
6. Lifecycle (lifecycle)
- provisioning — cold start, warm pool, claim from existing pool.
- cold_start_p50_ms, cold_start_p99_ms — measured, with the load profile they were measured under.
- pause_resume_supported (boolean).
- scheduled_deletion — default TTL and override semantics.
- singleton_guarantee — whether the sandbox is guaranteed singleton (matches the SIG Agent Sandbox Sandbox CRD model).
7. Snapshots and state (state)
- snapshot_supported (boolean) and snapshot_format.
- snapshot_encryption_at_rest.
- resumability — same-provider, cross-provider (matches Temporal/OpenAI's cross-backend forking demo).
- data_residency — region(s) where state may be stored.
8. Tool surface (tools)
- installed_packages — inventory at start, with versions.
- executables_allowlist — if enforced.
- commands_disallowed — explicit deny list (rm -rf /, curl | bash, secret-dump patterns) per agent-security best practices.
- tool_servers — attached MCP servers, their permissions, and their auth scopes.
- human_in_the_loop — list of operations that require explicit approval (financial transactions, data deletion, credential access).
9. Audit and observability (audit)
- log_streams — stdout, stderr, network, file writes, exec calls.
- log_immutability — whether logs are append-only and tamper-evident.
- retention — default and configurable.
- replay_supported — whether a session can be deterministically replayed.
- pii_handling — redaction or pass-through, with the authority for each.
10. Threat model (threat_model)
A short, explicit section. At minimum:
- Trust boundary: where the harness ends and the sandbox begins (per OpenAI's harness/compute split).
- Untrusted inputs: which inputs are treated as untrusted data (READMEs, issues, web content, MCP responses) per NVIDIA's prompt-injection guidance.
- In-scope risks: indirect prompt injection, data exfiltration via allowed APIs, resource exhaustion.
- Out-of-scope risks: e.g., supply-chain attacks on the host kernel.
- Mandatory controls: list which NVIDIA-style mandatory controls (manual approval gates, egress allowlist, output sanitization) are enforced and which are configurable.
Recommended fields
These fields are not required but materially improve agent decision quality:
- attestation_evidence_url — link to remote-attestation evidence for T3/T4 tiers.
- cost_model — wall-clock plus resource pricing, with a worked example.
- compatible_harnesses — SDKs known to work (OpenAI Agents SDK, LangGraph, custom).
- progressive_enforcement_state — ARMO-style stage: discovery, observation, selective, full_least_privilege.
- knowledge_cutoff_alignment — if the sandbox bundles a model, the model's training cutoff and any retrieval augmentation in use.
- incident_history_url — public post-mortems for past breaches or outages.
Minimum viable sandbox doc
The smallest acceptable sandbox documentation, in YAML, looks like this:
identity:
name: geodocs-research-sandbox
id: geodocs.dev/sandbox/research-v1
version: 1.2.0
provider: geodocs.dev
runtime_class: gvisor
isolation:
tier: T3
runtime_class: gvisor
shared_kernel: false
multi_tenant: false
attestation_supported: false
filesystem:
root: /workspace
writable_paths: [/workspace]
read_only_paths: [/etc, /usr]
persistence: session
max_disk: 10Gi
network:
default: deny
egress_allowlist:
- host: api.openai.com
port: 443
- host: search.geodocs.dev
port: 443
ingress: []
dns_policy: allowlist
resources:
cpu: 2000m
memory: 4Gi
timeouts:
per_tool_call: 30s
per_task_loop: 20m
per_sandbox_lifetime: 4h
lifecycle:
provisioning: warm_pool
cold_start_p50_ms: 120
pause_resume_supported: true
scheduled_deletion: 24h
singleton_guarantee: true
state:
snapshot_supported: true
snapshot_encryption_at_rest: true
resumability: same_provider
data_residency: [us-east1]
tools:
installed_packages: [python:3.13, node:22, jq:1.7]
human_in_the_loop:
- data_deletion
- credential_access
audit:
log_streams: [stdout, stderr, network, file_writes, exec]
log_immutability: true
retention: 30d
replay_supported: true
threat_model:
trust_boundary: harness on cluster control plane; compute in T3 sandbox
untrusted_inputs: [README.md, issues, web_content, mcp_responses]
in_scope_risks: [indirect_prompt_injection, data_exfiltration, resource_exhaustion]
out_of_scope_risks: [host_kernel_supply_chain]
mandatory_controls: [egress_allowlist, manual_approval_gates, output_sanitization]How agents consume this doc
A capable agent reads the sandbox doc before claiming or operating a sandbox. It MUST:
- Verify isolation tier matches risk. Per the execution-boundary discussion in the agent-infrastructure community, low-risk actions can run on T1-T2; medium on T3; high (arbitrary execution, secrets, customer data) requires T4.
- Honor the egress allowlist as authoritative. Treat any host outside the allowlist as a refusal trigger, not a configuration issue.
- Surface human-in-the-loop operations to the user. Never auto-approve operations the doc lists under tools.human_in_the_loop.
- Respect the timeout layering. Abort tool calls past per_tool_call; abort the loop past per_task_loop; abort the session past per_sandbox_lifetime.
- Record audit references. Capture the immutable log URI for every consequential action so downstream review is possible.
Validation checklist
A publisher (or auditor) can validate a sandbox doc with the following checks:
- [ ] All ten required sections are present.
- [ ] isolation.tier is exactly one of T1-T4.
- [ ] network.default is deny; an explicit allowlist is documented.
- [ ] All three timeouts keys are populated with concrete values.
- [ ] state.snapshot_encryption_at_rest is true for any tier T3 or T4.
- [ ] audit.log_immutability is true for production publication.
- [ ] threat_model.untrusted_inputs lists at least: external content, MCP responses, repository files.
- [ ] threat_model.mandatory_controls references concrete controls and not just policy intent.
- [ ] Recommended fields are populated when applicable to the audience (cost model, attestation evidence).
Common documentation failures
- Tier inflation. Calling a Docker container "hardware-isolated." Use the tier table verbatim.
- Missing timeouts. Documenting only an idle timeout and omitting per-tool-call and per-task-loop limits.
- Implicit egress. Documenting "sandboxed networking" without an explicit allowlist. Default-deny only counts when the allowlist is published.
- No threat model. A sandbox doc without an explicit trust boundary cannot be safely consumed by a high-autonomy agent.
- No replay or audit guarantees. Without immutable logs, post-incident review is impossible.
- No data residency. Required for regulated workloads.
Mapping to existing systems
| System | Closest field set |
|---|---|
| Kubernetes SIG Agent Sandbox Sandbox CRD | identity, isolation, lifecycle, state |
| Kubernetes SIG SandboxTemplate CRD | isolation, resources, network, tools |
| GKE SandboxClaim CRD | lifecycle.provisioning, state.resumability |
| OpenAI Sandbox Agents (Python SDK) | filesystem, network, state, tools, audit |
| Northflank/Blaxel/E2B/Modal SaaS | lifecycle.cold_start_*, state.data_residency, cost_model |
The goal is not to replace these systems but to give agents a stable surface to read regardless of the underlying provider.
FAQ
Q: Is this spec only for cloud sandbox providers?
No. Any publisher exposing a sandbox to an autonomous agent should document it in this shape. Internal platform teams shipping an MCP-fronted sandbox are explicitly in scope.
Q: Why force a four-tier isolation taxonomy?
Because the security community already converges on roughly these four boundaries (same-host, container, user-space-kernel/light-VM, hardware-isolated VM). A common taxonomy lets agents reason about fit without parsing free text.
Q: How does this relate to MCP?
MCP is the protocol surface; the sandbox is the runtime surface. Tools exposed to an agent via MCP often execute inside a sandbox; this spec describes the sandbox so MCP clients can choose appropriately.
Q: What about prompt-injection mitigations?
This spec asks publishers to surface the threat model (untrusted inputs, mandatory controls) and to document human-in-the-loop gates. It does not prescribe specific mitigations — those belong in agent harness or model-level guidance.
Q: How often should the doc be re-reviewed?
At minimum once per quarter (review_cycle_days: 90) and immediately after any change to isolation tier, egress allowlist, mandatory controls, or data residency.
Related Articles
Agent Error Handling Documentation Specification: Designing Errors Agents Can Self-Repair From
Spec for documenting error states, validation messages, and self-repair hints so AI agents recover automatically when calling your tools and APIs.
Agent Permission Model Specification: Documenting Tool Access for AI Agents
A documentation specification for AI agent permission models: scopes, least-privilege defaults, MCP session policies, and consent flows agents can parse.
Agent Skill Manifest Specification: Publishing SKILL.md for AI Agent Discovery
Agent Skill Manifest specification: how to author and publish SKILL.md so Claude, ChatGPT, Codex, Gemini, and Copilot agents discover and reuse your docs.