Agent Trajectory Documentation Spec: Designing Replay-Ready Docs for Browser Agents

Agent trajectory documentation is a replay-ready format pairing step manifests, stable selectors, expected outcomes, and verification steps so browser agents can execute and cite the procedure deterministically.

TL;DR: Browser-agent runbooks fail when they read like blog posts. They succeed when each step is a typed instruction with a stable selector, an expected outcome, a verification probe, and a citation back to the source procedure. This spec defines the minimum format and the QA gates that keep trajectory docs replay-ready and citation-friendly for both human and agent readers.

Why traditional how-to docs fail browser agents

LLM-driven browser agents (ChatGPT Atlas, Comet, Browser Use, browser tooling in Vercel and Firecrawl) work best when they can plan a deterministic trajectory. A how-to written in narrative voice ("first, head over to settings and find the option labeled...") forces the agent to interpret rather than execute, which compounds error rates over multi-step tasks.

Replay-ready trajectory docs solve three problems at once:

The agent executes the procedure correctly without inferring.
The agent cites the procedure as a source rather than fabricating one.
A human can review and update the procedure when the underlying UI changes.

Specification overview

A trajectory doc is a structured document with five sections.

Section 1: Identity

The doc declares its scope:

title
canonical_concept_id
target_application (and version if applicable)
prerequisites
success criteria (measurable)
last_verified_at

Section 2: Step manifest

The manifest is an ordered list of step objects. Each step has:

step_id
goal (one sentence, declarative)
selectors (semantic and fallback)
action (click, type, navigate, read, assert)
inputs (typed)
expected_outcome (observable change)
verification (probe to confirm outcome)
on_failure (retry policy or branch)

Selectors should be ordered semantic → role → text → CSS, with the CSS as the last fallback. Agents prefer role-based selectors because they survive UI restyling.

Section 3: Verification probes

Each probe is reusable. A probe takes a target and returns pass | fail | unknown. Probes are referenced by id from steps.

probes:
  - id: header_contains
    type: dom
    target: "role=heading"
    expression: "contains_text"

Section 4: Replay envelope

The envelope packages the manifest with metadata for an agent runtime to execute:

{
  "version": "1.0",
  "identity": { "title": "...", "target_application": "example.com" },
  "prerequisites": ["..."],
  "steps": [ ... ],
  "probes": [ ... ],
  "last_verified_at": "2026-04-28"
}

Section 5: Citation block

A citation block at the bottom maps each step to the source document section that authorizes it. This lets an agent emit a citation chain that points back to the trajectory doc when it reports completion.

Replay-ready guarantees

A trajectory doc that conforms to this spec guarantees:

Every step has a verification probe.
Every selector has at least one semantic fallback.
Every step has an on_failure branch.
The manifest is monotonically progressing (no implicit loops).
The doc carries a last_verified_at not older than the application's release cadence.

A conformant runtime that fails to satisfy any of these guarantees rejects the doc and refuses to execute.

Authoring patterns

Write goals in declarative voice. "Open the billing settings page," not "let's head over to the billing settings."
Pair every action with a verification. A click without a probe is unobservable.
Keep steps small. A step that performs three things makes failure modes ambiguous.
Externalize text inputs. Place sample inputs in a separate fixtures section rather than inline.
Include a UI screenshot only as supporting context. Screenshots become stale; selectors are the source of truth.

QA gates

Before publish, the trajectory doc passes through:

Schema validator on the manifest.
Selector reachability check (probes are run against a fresh instance).
Failure-branch coverage check (every step has on_failure).
Last-verified freshness gate (rejects docs older than threshold).
Citation block completeness (every step links back to source).

Failure modes

Implicit waits. A step that depends on "the page loading" without a probe is non-deterministic.
CSS-only selectors. They break on minor restyling; semantic selectors survive.
Missing on_failure. The agent has no recovery path and aborts.
Over-broad probes. "Page contains the word billing" passes for many states; tighten the probe.
No citation block. The agent emits a successful trajectory but cannot cite the procedure as a source.

Pairing with citation attribution

A trajectory doc emitted by an agent feeds into the agent's citation attribution manifest. The manifest's chain includes a step entry per trajectory step, and the citation block in the trajectory doc becomes the source list for those entries. The two specs together produce a replayable, attributable record of the agent's actions.

FAQ

Q: Is this only for browser agents?

It is most useful for browser agents because the DOM is the action surface. The same structure adapts to API agents by replacing selectors with endpoint references; the manifest, probes, and citation block are reusable.

Q: Do I need to run the manifest in CI?

Yes. Selector reachability and probe correctness drift fast; CI runs against a fresh app instance keep the doc honest.

Q: How does this relate to Playwright traces?

Playwright traces capture a specific execution. Trajectory docs describe the procedure abstractly. A trace is a one-shot artifact; a trajectory doc is the source the trace replays from.

Q: Can I auto-generate trajectory docs?

The selector reachability and probes can be generated from a recording, but goals, on_failure branches, and the citation block must be authored or curated.

Q: What is the highest-leverage section?

The verification probes. They turn the doc from a wishlist into an executable contract.