Agent Sandbox Documentation Specification: Documenting Execution Environments for Autonomous AI Agents

This specification defines the required and recommended fields a publisher must include when documenting an AI agent sandbox — identity, isolation tier, filesystem, network policy, resources, lifecycle, snapshots, tool surface, audit, and threat model — so downstream agents and operators can use them safely. Aligned with the Kubernetes SIG Agent Sandbox project, GKE Agent Sandbox CRDs, and OpenAI Sandbox Agents.

Status: Draft 1.0. This spec is doc-first: it does not prescribe how to build a sandbox; it prescribes what publishers must document about a sandbox so an agent or operator can decide whether and how to use it.

TL;DR

A sandbox doc must answer five questions: what runs, where it runs, what it can touch, what it cannot touch, and how it is observed.
The Kubernetes SIG Agent Sandbox project (Sandbox, SandboxTemplate, SandboxClaim/SandboxWarmPool CRDs) and OpenAI's Sandbox Agents define the infrastructure shape; this spec defines the documentation shape that surfaces that infrastructure to autonomous agents.
Publishers should document at minimum: isolation tier, filesystem layout, exposed ports, mounted data, allowed egress, resource limits, snapshot/lifecycle behavior, tool surface, audit guarantees, and the threat model boundary.

Why a documentation spec

Kubernetes maintainers Janet Kuo and Justin Santa Barbara note that AI agents are stateful, mostly idle, and run for anywhere from 50 ms to weeks — a workload pattern existing primitives (Deployments, StatefulSets) do not fit cleanly. The SIG Apps Agent Sandbox project introduces a Sandbox CRD with stable identity, persistent storage, and pause/resume lifecycle; GKE adds Sandbox, SandboxTemplate, and SandboxClaim CRDs. OpenAI ships Sandbox Agents in the Python Agents SDK with files, commands, packages, ports, snapshots, and resumable state.

Platforms (Modal, Daytona, Docker, E2B, Firecrawl, Blaxel) compete on cold-start latency — Blaxel cites Firecracker microVMs at roughly 100-125 ms versus traditional serverless at 300 ms-several seconds. NVIDIA's AI Red Team and ARMO both highlight indirect prompt injection as the dominant threat that makes hard-isolation boundaries non-optional.

What is missing across these efforts is a publisher-side documentation contract. Without one, downstream agents cannot reason about whether a sandbox is fit-for-purpose for a given task. This spec fills that gap.

Audience and applicability

This spec applies to anyone publishing a sandbox that an autonomous agent (or another agent acting on a user's behalf) will request, claim, or operate. That includes:

Internal platform teams exposing a sandbox via an MCP server.
SaaS sandbox providers (Modal, E2B, Daytona, Blaxel, Firecrawl, GKE).
Open-source projects shipping a SandboxTemplate for community use.

It does not apply to ad-hoc local sandboxes (devcontainers, bubblewrap scripts) unless they are exposed to autonomous agents.

Required fields

Every sandbox documentation page MUST include the following sections. Field names are recommended JSON-LD/YAML keys; surface labels can vary.

1. Identity (identity)

name — human-readable sandbox or template name.
id — stable, machine-readable identifier (URI-safe).
version — semver of the sandbox or template.
provider — publishing organization (with verified-agent-identity link if applicable).
runtime_class — the underlying runtime (e.g., gvisor, kata, firecracker, docker, nodeOS).

2. Isolation tier (isolation)

Document the boundary explicitly. Use the four tiers below; do not invent your own.

Tier	Boundary	Typical backing	Suitable for
T1	Same-host, OS-level	bubblewrap, devcontainer	Low-risk, single-user dev
T2	Container, shared kernel	Docker, OCI runc	Low-to-medium risk; non-untrusted code
T3	User-space kernel / lightweight VM	gVisor, Kata Containers	Medium risk; untrusted code with shared host
T4	Hardware-isolated VM / microVM	Firecracker, full KVM	High risk; arbitrary execution, customer data, secrets

Document: tier, runtime_class, shared_kernel (boolean), multi_tenant (boolean), attestation_supported (boolean).

3. Filesystem (filesystem)

root — the working directory the agent sees on entry.
writable_paths — list of paths the agent may write.
read_only_paths — list of paths visible but not writable.
mounted_data — named mounts (data rooms, source directories), including their provenance and retention policy.
persistence — one of ephemeral, session, persistent, snapshot.
max_disk — hard quota.

4. Network policy (network)

default — must be deny (zero-trust default) per the Northflank and Firecrawl guidance.
egress_allowlist — explicit allowlist of hosts, ports, and protocols.
ingress — exposed ports and the auth model on each.
dns_policy — allowed resolvers and any DNS allowlist.
bandwidth_limit — outbound rate cap.

5. Resources and limits (resources)

cpu — millicores or shares; hard cap.
memory — hard cap (oom_kill on overrun).
gpu — type, count, and sharing model.
disk_io — IOPS and bandwidth caps.
process_limit — max PIDs.
timeouts — layered: per_tool_call, per_task_loop, per_sandbox_lifetime. Firecrawl's guidance treats all three as required.

6. Lifecycle (lifecycle)

provisioning — cold start, warm pool, claim from existing pool.
cold_start_p50_ms, cold_start_p99_ms — measured, with the load profile they were measured under.
pause_resume_supported (boolean).
scheduled_deletion — default TTL and override semantics.
singleton_guarantee — whether the sandbox is guaranteed singleton (matches the SIG Agent Sandbox Sandbox CRD model).

7. Snapshots and state (state)

snapshot_supported (boolean) and snapshot_format.
snapshot_encryption_at_rest.
resumability — same-provider, cross-provider (matches Temporal/OpenAI's cross-backend forking demo).
data_residency — region(s) where state may be stored.

8. Tool surface (tools)

installed_packages — inventory at start, with versions.
executables_allowlist — if enforced.
commands_disallowed — explicit deny list (rm -rf /, curl | bash, secret-dump patterns) per agent-security best practices.
tool_servers — attached MCP servers, their permissions, and their auth scopes.
human_in_the_loop — list of operations that require explicit approval (financial transactions, data deletion, credential access).

9. Audit and observability (audit)

log_streams — stdout, stderr, network, file writes, exec calls.
log_immutability — whether logs are append-only and tamper-evident.
retention — default and configurable.
replay_supported — whether a session can be deterministically replayed.
pii_handling — redaction or pass-through, with the authority for each.

10. Threat model (threat_model)

A short, explicit section. At minimum:

Trust boundary: where the harness ends and the sandbox begins (per OpenAI's harness/compute split).
Untrusted inputs: which inputs are treated as untrusted data (READMEs, issues, web content, MCP responses) per NVIDIA's prompt-injection guidance.
In-scope risks: indirect prompt injection, data exfiltration via allowed APIs, resource exhaustion.
Out-of-scope risks: e.g., supply-chain attacks on the host kernel.
Mandatory controls: list which NVIDIA-style mandatory controls (manual approval gates, egress allowlist, output sanitization) are enforced and which are configurable.

Recommended fields

These fields are not required but materially improve agent decision quality:

attestation_evidence_url — link to remote-attestation evidence for T3/T4 tiers.
cost_model — wall-clock plus resource pricing, with a worked example.
compatible_harnesses — SDKs known to work (OpenAI Agents SDK, LangGraph, custom).
progressive_enforcement_state — ARMO-style stage: discovery, observation, selective, full_least_privilege.
knowledge_cutoff_alignment — if the sandbox bundles a model, the model's training cutoff and any retrieval augmentation in use.
incident_history_url — public post-mortems for past breaches or outages.

Minimum viable sandbox doc

The smallest acceptable sandbox documentation, in YAML, looks like this:

identity:
  name: geodocs-research-sandbox
  id: geodocs.dev/sandbox/research-v1
  version: 1.2.0
  provider: geodocs.dev
  runtime_class: gvisor
isolation:
  tier: T3
  runtime_class: gvisor
  shared_kernel: false
  multi_tenant: false
  attestation_supported: false
filesystem:
  root: /workspace
  writable_paths: [/workspace]
  read_only_paths: [/etc, /usr]
  persistence: session
  max_disk: 10Gi
network:
  default: deny
  egress_allowlist:
    - host: api.openai.com
      port: 443
    - host: search.geodocs.dev
      port: 443
  ingress: []
  dns_policy: allowlist
resources:
  cpu: 2000m
  memory: 4Gi
  timeouts:
    per_tool_call: 30s
    per_task_loop: 20m
    per_sandbox_lifetime: 4h
lifecycle:
  provisioning: warm_pool
  cold_start_p50_ms: 120
  pause_resume_supported: true
  scheduled_deletion: 24h
  singleton_guarantee: true
state:
  snapshot_supported: true
  snapshot_encryption_at_rest: true
  resumability: same_provider
  data_residency: [us-east1]
tools:
  installed_packages: [python:3.13, node:22, jq:1.7]
  human_in_the_loop:
    - data_deletion
    - credential_access
audit:
  log_streams: [stdout, stderr, network, file_writes, exec]
  log_immutability: true
  retention: 30d
  replay_supported: true
threat_model:
  trust_boundary: harness on cluster control plane; compute in T3 sandbox
  untrusted_inputs: [README.md, issues, web_content, mcp_responses]
  in_scope_risks: [indirect_prompt_injection, data_exfiltration, resource_exhaustion]
  out_of_scope_risks: [host_kernel_supply_chain]
  mandatory_controls: [egress_allowlist, manual_approval_gates, output_sanitization]

How agents consume this doc

A capable agent reads the sandbox doc before claiming or operating a sandbox. It MUST:

Verify isolation tier matches risk. Per the execution-boundary discussion in the agent-infrastructure community, low-risk actions can run on T1-T2; medium on T3; high (arbitrary execution, secrets, customer data) requires T4.
Honor the egress allowlist as authoritative. Treat any host outside the allowlist as a refusal trigger, not a configuration issue.
Surface human-in-the-loop operations to the user. Never auto-approve operations the doc lists under tools.human_in_the_loop.
Respect the timeout layering. Abort tool calls past per_tool_call; abort the loop past per_task_loop; abort the session past per_sandbox_lifetime.
Record audit references. Capture the immutable log URI for every consequential action so downstream review is possible.

Validation checklist

A publisher (or auditor) can validate a sandbox doc with the following checks:

[ ] All ten required sections are present.
[ ] isolation.tier is exactly one of T1-T4.
[ ] network.default is deny; an explicit allowlist is documented.
[ ] All three timeouts keys are populated with concrete values.
[ ] state.snapshot_encryption_at_rest is true for any tier T3 or T4.
[ ] audit.log_immutability is true for production publication.
[ ] threat_model.untrusted_inputs lists at least: external content, MCP responses, repository files.
[ ] threat_model.mandatory_controls references concrete controls and not just policy intent.
[ ] Recommended fields are populated when applicable to the audience (cost model, attestation evidence).

Common documentation failures

Tier inflation. Calling a Docker container "hardware-isolated." Use the tier table verbatim.
Missing timeouts. Documenting only an idle timeout and omitting per-tool-call and per-task-loop limits.
Implicit egress. Documenting "sandboxed networking" without an explicit allowlist. Default-deny only counts when the allowlist is published.
No threat model. A sandbox doc without an explicit trust boundary cannot be safely consumed by a high-autonomy agent.
No replay or audit guarantees. Without immutable logs, post-incident review is impossible.
No data residency. Required for regulated workloads.

Mapping to existing systems

System	Closest field set
Kubernetes SIG Agent Sandbox Sandbox CRD	identity, isolation, lifecycle, state
Kubernetes SIG SandboxTemplate CRD	isolation, resources, network, tools
GKE SandboxClaim CRD	lifecycle.provisioning, state.resumability
OpenAI Sandbox Agents (Python SDK)	filesystem, network, state, tools, audit
Northflank/Blaxel/E2B/Modal SaaS	lifecycle.cold_start_*, state.data_residency, cost_model

The goal is not to replace these systems but to give agents a stable surface to read regardless of the underlying provider.

FAQ

Q: Is this spec only for cloud sandbox providers?

No. Any publisher exposing a sandbox to an autonomous agent should document it in this shape. Internal platform teams shipping an MCP-fronted sandbox are explicitly in scope.

Q: Why force a four-tier isolation taxonomy?

Because the security community already converges on roughly these four boundaries (same-host, container, user-space-kernel/light-VM, hardware-isolated VM). A common taxonomy lets agents reason about fit without parsing free text.

Q: How does this relate to MCP?

MCP is the protocol surface; the sandbox is the runtime surface. Tools exposed to an agent via MCP often execute inside a sandbox; this spec describes the sandbox so MCP clients can choose appropriately.

Q: What about prompt-injection mitigations?

This spec asks publishers to surface the threat model (untrusted inputs, mandatory controls) and to document human-in-the-loop gates. It does not prescribe specific mitigations — those belong in agent harness or model-level guidance.

Q: How often should the doc be re-reviewed?

At minimum once per quarter (review_cycle_days: 90) and immediately after any change to isolation tier, egress allowlist, mandatory controls, or data residency.