Agent Self-Correction Loop: Critique, Revise, and Converge
An agent self-correction loop runs the agent's own output through a critique step — either by the agent itself (self-critic) or a separate critic model — and revises until a convergence test passes or a max-iterations cap is hit, lifting output quality at the cost of additional latency and tokens that must be bounded explicitly.
TL;DR
- A self-correction loop is critique → revise → converge, repeated until the convergence test passes or the iteration cap is reached.
- Self-critic is cheaper but biased toward agreement with the producer; a separate critic model breaks the bias at the cost of one extra model in the system.
- The convergence test is what makes the loop terminate — choose between critic-pass, delta-quality threshold, or max iterations (typical: 1-3).
- Cost and latency expand by the iteration count; bound them explicitly in the runtime budget rather than letting the loop run free.
Definition
A self-correction loop is a runtime pattern in which an agent's output is reviewed by a critique step, revised based on the critique, and re-checked until a convergence condition is met. The pattern formalizes the intuition that LLMs are better editors than first-drafters: a single forward pass produces a draft; a follow-up pass with explicit review instructions catches errors the first pass missed. Reflexion (Shinn et al., 2023) and Self-Refine (Madaan et al., 2023) are two of the canonical research formalizations.
The loop has four moving parts: the producer (the agent generating the draft), the critic (which evaluates the draft against a rubric or task), the reviser (which rewrites the draft using the critique), and the convergence test (which decides whether to stop or iterate again). In practice, the producer and reviser are usually the same model with different prompts; the critic may be the same model (self-critique) or a separate model.
Why this matters
Single-pass agent outputs have predictable failure modes — missed requirements, factual sloppiness, weak structure — that humans catch trivially in review but the model rarely catches in its own forward pass. Adding a critique step measurably lifts quality on tasks where the rubric is explicit (code, structured documents, multi-criteria summaries) without requiring a stronger producer model. For some tasks, a smaller-parameter model with self-correction approaches the quality of a much larger single-pass model at lower total cost (Shinn et al., 2023).
The cost is real: every iteration is a fresh model call, sometimes two (critic + reviser). Without bounds, a self-correction loop is the most expensive pattern in agent design — and the easiest to misconfigure into infinite cycles. The whole point of the spec is to express the bounds explicitly.
How it works
The loop is a small state machine:
flowchart LR
A["Producer: draft v1"] --> B["Critic: review"]
B --> C{"Pass convergence test?"}
C -- yes --> D["Output draft"]
C -- no --> E{"Iteration cap reached?"}
E -- yes --> D
E -- no --> F["Reviser: draft vN"]
F --> BProducer. Generates the initial draft from the task spec. No special prompt — same as a single-pass agent.
Critic. Reads the draft and produces a structured critique against an explicit rubric. The rubric is the most important design decision: a vague "is this good?" rubric produces vague critiques; a specific rubric ("does the code pass these 3 tests?") produces actionable critiques. Anthropic's constitutional AI work formalizes the rubric-based critique pattern at training time, but the runtime version is the same idea (Bai et al., 2022).
Reviser. Reads the original draft + critique and produces a revised draft. Typically the same model as the producer with a "revision" prompt that explicitly references the critique points.
Convergence test. Three common choices: (1) critic-pass — the critic returns "no remaining issues"; (2) delta-quality threshold — the critic's score for v(n) and v(n+1) differ by less than some epsilon; (3) max iterations — hard cap, typically 1-3 for production loops, up to 5 for offline / batch jobs.
Practical application
A 4-step build:
- Define the rubric in machine-checkable form. "Code compiles", "JSON validates against schema", and "claim has citation" are good rubrics. "Is the answer high quality?" is not. The rubric is what makes the loop converge.
- Pick same-model vs. separate critic. Default to same-model in week 1 — it's the cheapest path to a working loop. Switch to a separate, sometimes weaker critic when you observe self-critic agreeing too readily with the producer.
- Set the iteration cap up front. 2 iterations (one revision) is a reasonable default for interactive products; 3-5 for batch / offline. Document the cap in the agent's spec.
- Track convergence metrics. Log per-iteration scores, the iteration at which each task converged, and the rate of "max iterations hit" — that rate is your early-warning indicator that the rubric is broken.
OpenAI's evals cookbook shows the pattern applied to coding tasks where unit tests are the rubric — the loop terminates as soon as the tests pass, with a max iteration cap as the safety net (OpenAI, 2024).
Common mistakes
- Infinite loops. A loop without a convergence test or iteration cap will run until the budget is exhausted. Always specify both.
- Critic that echoes the producer. When the critic and producer are the same model with similar prompts, the critic just rewrites the producer's draft in slightly different words and declares victory. Diverge the prompts deliberately or use a separate model.
- Rubric that's too vague. "Is this good?" produces noise. Define the rubric as a list of yes/no checks against the task spec.
- No latency budget. Self-correction expands wall-clock time linearly with iterations; user-facing flows need a hard timeout that aborts the loop and ships the latest draft.
FAQ
Q: How does this relate to Reflexion?
Reflexion (Shinn et al., 2023) is one of the canonical formulations of self-correction: the agent attempts a task, observes the outcome, generates a verbal "reflection" on what went wrong, and stores that reflection as memory for the next attempt. Self-correction loops are the runtime pattern; Reflexion is one specific shape of that pattern, optimized for tasks where the outcome is observable (success/failure on a benchmark). Self-Refine (Madaan et al., 2023) is a closely related variant for open-ended generation.
Q: Self-critique vs. separate critic model — which is better?
Self-critique (same model in critic mode) is cheaper and works well when the producer model is strong enough to find its own errors. Its weakness is bias: the critic agrees too readily with the producer because they share priors. A separate critic model — often a smaller, differently-tuned model — breaks the bias and tends to catch more errors per pass, at the cost of one extra model in the runtime. Production systems often start same-model and switch to separate-critic once they have eval data showing the bias.
Q: How many iterations are typical?
In production, 1-2 revisions (i.e., max 3 drafts) is the sweet spot for interactive agents — beyond that, marginal quality gains drop fast and latency becomes user-visible. Offline / batch jobs (e.g., generating a content corpus) can afford 3-5 iterations because no user is waiting. The single most predictive metric is the per-iteration delta-quality: when v(n+1) is barely better than v(n), the loop has hit diminishing returns and should stop (OpenAI, 2024).
Related Articles
Agent Conversation Summarization: Triggers, Schema, and Retention
Specification for compressing agent conversation history into running summaries: triggers, summary schema, retention rules, and recovery patterns for long-running chats.
Agent Evaluation Harness Documentation: How to Spec an Eval Suite for AI Agents
Specification for documenting an AI agent evaluation harness — eval suites, scorers, datasets, and trajectory grading that humans and docs agents can both consume.
Agent Knowledge Base Integration: RAG, MCP, and Direct API Patterns
Spec for connecting AI agents to internal knowledge bases via RAG vector stores, MCP servers, or direct retrieval APIs with provenance and ACL stamping.