Geodocs.dev

Knowledge Cutoff Disclosure Specification for AI Agent Documentation

ShareLinkedIn

Open this article in your favorite AI assistant for deeper analysis, summaries, or follow-up questions.

This specification (KCDS 1.0) defines the minimum metadata that AI agent documentation must publish to disclose its training cutoff, effective cutoff per topic, last-indexed time, and live-data capabilities. It targets agent builders, documentation platforms, and orchestrators that need machine-readable temporal context to detect and route around stale answers.

TL;DR. AI agents do not have a single "cutoff" — they have a training cutoff, an effective cutoff per topic, and a last-indexed time for any retrieval layer. KCDS 1.0 standardises a small YAML block that agent documentation MUST publish so downstream orchestrators, evaluators, and end users can decide when to trust an answer. Skip ahead to §3 Required fields for the schema, or to §5 Examples for copy-paste templates.

1. Status of this specification

This is version 1.0 of the Knowledge Cutoff Disclosure Specification (KCDS), maintained by the Geodocs Research Team and hosted in the AI agents hub. It is a publisher-side standard — it tells agent owners what to write down. It is not a model-evaluation method; for that, see the work on effective cutoffs by Cheng et al. (2024).

Conformance is voluntary. Tools that adopt KCDS SHOULD advertise the badge kcds-1.0 in their documentation footer and in their model_card.md.

The keywords MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY in this document are to be interpreted as described in RFC 2119.

2. Why agent documentation needs a temporal disclosure

Modern LLM-backed agents are paired with one or more dates that are easy to confuse:

  • A training cutoff is the date after which raw training data was no longer ingested. GPT-4o, for example, reports a training cutoff of October 2023 even though it ships and is widely used in 2026.
  • An effective cutoff is the latest date for which a specific topic or sub-resource is meaningfully represented in the model. Cheng et al. (2024) show that effective cutoffs for individual resources often differ — sometimes by years — from the publisher-reported cutoff.
  • A last-indexed or last-crawled time applies to any retrieval, RAG, or browsing layer the agent depends on at inference time.
  • Live data capabilities (web browsing, tool use, real-time APIs) can override the above on a per-query basis.

Without a machine-readable disclosure, an orchestrator that wants to ask "is this agent aware of last week's policy change?" has to scrape marketing pages and reverse-engineer dates from chat replies. KCDS replaces that with a contract.

For background reading on the underlying model behaviour, see our companion article on effective vs reported knowledge cutoff and the agent freshness monitoring guide.

3. Required fields

Each agent or documentation page MUST publish a knowledge_cutoff block. Fields marked (required) MUST be present; others are RECOMMENDED unless noted.

knowledge_cutoff:
  training_cutoff: "2024-10-31"          # required, ISO-8601 date
  training_cutoff_precision: "month"     # required: day | month | quarter | year
  effective_cutoff:                      # recommended, array of {domain, date}
    - domain: "general"
      date: "2024-10-31"
    - domain: "regulatory"
      date: "2024-06-30"
  last_indexed: "2026-04-28T03:00:00Z"   # required if a retrieval layer exists
  index_scope: "https://geodocs.dev/**"  # required if last_indexed is set
  live_data_capabilities:                # required
    web_browsing: true
    tool_use: true
    real_time_apis: ["stocks", "weather"]
  revalidation_policy:                   # required
    cycle_days: 90
    next_review: "2026-07-28"
  source_of_truth: "https://docs.example.com/agents/foo/cutoff"

3.1 training_cutoff (required)

  • ISO-8601 date.
  • MUST be the publisher's reported cutoff, not an estimate inferred from probing.
  • MUST be paired with training_cutoff_precision so downstream tools do not imply false precision (writing 2024-10-31 when only the month is known is misleading).
  • Array of {domain, date} pairs.
  • Use when the agent's coverage of specific domains is known to lag the training cutoff.
  • Domains SHOULD be drawn from a controlled vocabulary published alongside the agent (e.g. general, regulatory, product-docs, news).
  • Each entry MAY be sourced from internal probing or third-party evaluation; cite the source in source_of_truth.

3.3 last_indexed (required if retrieval is used)

  • ISO-8601 datetime in UTC.
  • Refers to the most recent successful crawl/index of the resources named by index_scope.
  • For RAG agents this is usually more important to consumers than the underlying model's training cutoff.

3.4 live_data_capabilities (required)

  • Object describing what the agent can fetch at inference time.
  • An agent with web_browsing: true SHOULD additionally document its allow/blocklists and rate limits in adjacent fields.
  • An agent with no live capabilities MUST set all sub-fields to false or [] rather than omitting the block.

3.5 revalidation_policy (required)

  • The cadence at which the agent owner re-asserts the cutoff fields.
  • KCDS recommends cycle_days <= 90 for production-facing agents.
  • next_review MUST be a future date at the time of publication.
  • A canonical URL where the latest disclosure lives. Useful when the agent is distributed across multiple platforms (e.g. SDK, hosted UI, marketplace listing).

4. Where to put the disclosure

KCDS supports three placements, in order of preference:

  1. Agent manifest — the YAML/JSON config the agent ships with (e.g. agent.yaml, model_card.md, manifest.json).
  2. /.well-known/ai-cutoff.json at the agent's documentation domain, served as application/json.
  3. Frontmatter in the canonical documentation page (this is how Geodocs itself ships its disclosures).

Producers MUST keep placements in sync. If they diverge, the agent manifest wins and tools SHOULD emit a KCDS-WARN-DRIFT warning.

5. Examples

5.1 Pure-LLM agent, no retrieval

knowledge_cutoff:
  training_cutoff: "2024-10-01"
  training_cutoff_precision: "month"
  live_data_capabilities:
    web_browsing: false
    tool_use: false
    real_time_apis: []
  revalidation_policy:
    cycle_days: 180
    next_review: "2026-10-01"

5.2 RAG-backed support agent

knowledge_cutoff:
  training_cutoff: "2024-04-01"
  training_cutoff_precision: "month"
  effective_cutoff:
    - domain: "product-docs"
      date: "2026-04-26"
    - domain: "policy-changes"
      date: "2026-03-15"
  last_indexed: "2026-04-28T03:00:00Z"
  index_scope: "https://docs.example.com/**"
  live_data_capabilities:
    web_browsing: false
    tool_use: true
    real_time_apis: ["status-page"]
  revalidation_policy:
    cycle_days: 7
    next_review: "2026-05-05"
  source_of_truth: "https://docs.example.com/agents/support/cutoff"

5.3 Browsing-enabled assistant

knowledge_cutoff:
  training_cutoff: "2024-10-31"
  training_cutoff_precision: "month"
  live_data_capabilities:
    web_browsing: true
    tool_use: true
    real_time_apis: []
    browsing_allowlist: ["*.gov", "*.edu", "docs.example.com"]
  revalidation_policy:
    cycle_days: 30
    next_review: "2026-05-29"

6. Conformance

A documentation page is KCDS-1.0 conformant if and only if all of the following hold:

  1. It exposes a knowledge_cutoff block in at least one of the three placements in §4.
  2. All required fields validate against the JSON Schema published at https://geodocs.dev/schemas/kcds-1.0.json.
  3. The revalidation_policy.next_review date is in the future relative to the validator's clock.
  4. If last_indexed is present, it is no older than revalidation_policy.cycle_days.
  5. No required field uses a placeholder string (TBD, unknown, empty).

Tools MAY emit warnings — not failures — for conformant-but-stale disclosures, e.g. when last_indexed is older than half of cycle_days.

7. Validation rules

Validators SHOULD enforce the following rule IDs so CI pipelines can pin behaviour:

Rule IDCheck
KCDS-001training_cutoff ≤ today
KCDS-002training_cutoff_precision is one of day/month/quarter/year
KCDS-003last_indexed ≤ now (UTC)
KCDS-004last_indexed requires index_scope
KCDS-005Each effective_cutoff[].date ≤ training_cutoff OR sourced from RAG
KCDS-006revalidation_policy.cycle_days is a positive integer
KCDS-007revalidation_policy.next_review is in the future
KCDS-008live_data_capabilities block is present and complete
KCDS-009If web_browsing: true, an allow- or block-list is declared
KCDS-010source_of_truth (if present) is an absolute HTTPS URL
KCDS-011No required field is a placeholder string
KCDS-012Manifest, /.well-known/, and frontmatter copies do not drift

8. Common mistakes

  • Conflating training cutoff with knowledge freshness. The reported training cutoff is one date; effective coverage varies sharply by topic. Always declare both when you can.
  • Disclosing only the model's cutoff when the agent uses RAG. Users care about what the retrieval layer knows, not just the base model. Add last_indexed and index_scope.
  • Static disclosures that never get re-asserted. Without a revalidation_policy, KCDS treats the disclosure as expired after 180 days.
  • Assuming "browsing enabled" means "always current". Browsing is opportunistic, fails silently, and is constrained by allowlists. Declare its scope.
  • Hard-coding the disclosure into prose only. Prose is for humans; KCDS requires a machine-readable block so orchestrators can route around stale answers automatically.

9. Relationship to other specs

  • Model cards (Mitchell et al., 2019) cover ethical and intended-use disclosure. KCDS sits inside the "limitations" section of a model card and is complementary, not competing.
  • AI BOM / SBOM-AI efforts focus on dependency, dataset, and licence provenance. KCDS focuses narrowly on temporal context.
  • Effective-cutoff probing methods (Cheng et al., 2024) produce evidence about a model's true knowledge horizon. KCDS is the publication channel for that evidence.
  • Schema.org Dataset / temporalCoverage describes datasets; KCDS describes agents and may reference Dataset entries via index_scope.

10. FAQ

Q: Is a training cutoff the same as a knowledge cutoff?

No. The training cutoff is the publisher's reported date for raw training data. The knowledge cutoff is what the model can actually answer correctly, and probing studies show it is usually earlier than the reported cutoff and varies per topic. KCDS asks agents to disclose both: training_cutoff for the publisher-reported value and effective_cutoff[] for measured per-domain values.

Q: Do I need KCDS if my agent only uses retrieval?

Yes. last_indexed, index_scope, and live_data_capabilities are still required, and the training cutoff fields can describe the underlying base model so consumers know what reasoning context the agent inherits. Even a thin RAG wrapper should declare both layers.

Q: What happens when the agent's browsing returns content newer than its training cutoff?

That is expected and allowed. KCDS distinguishes baseline knowledge (training_cutoff) from runtime knowledge (live_data_capabilities). Agents SHOULD cite retrieved sources per answer so users can verify freshness independently of the disclosure block.

Q: How often should I re-assert the disclosure?

KCDS recommends cycle_days <= 90 for production agents. For RAG agents that re-index daily, set cycle_days: 7 so the disclosure tracks operational reality. For frozen research models, cycle_days: 365 is acceptable.

Q: Where do I publish the JSON Schema?

KCDS 1.0 ships the schema at https://geodocs.dev/schemas/kcds-1.0.json. Tools SHOULD pin to a major version (kcds-1.x). Backward-incompatible changes will increment the major version and ship at a new URL.

  • AI agents hub
  • Effective vs reported knowledge cutoff
  • Agent freshness monitoring
  • Model card essentials
  • Retrieval-augmented generation overview

Related Articles

specification

Agent Authentication Documentation Spec

Document authentication for autonomous agents: OAuth flows, API keys, scopes, error states, and consent UX patterns AI agents need to operate safely.

specification

Agent Circuit Breaker Specification

Specification for circuit breakers protecting AI agent calls to LLM providers and tools, including state transitions, threshold tuning, fallback strategies, and observability hooks.

comparison

Grounding vs Fact-Checking: What's the Difference in AI Content Workflows?

Grounding anchors AI answers to trusted sources before generation; fact-checking verifies claims after generation. Learn when each belongs in your AI content workflow.

Stay Updated

GEO & AI Search Insights

New articles, framework updates, and industry analysis. No spam, unsubscribe anytime.