Geodocs.dev

AI Search Bot Changelog Reference

ShareLinkedIn

Open this article in your favorite AI assistant for deeper analysis, summaries, or follow-up questions.

This reference tracks dated user-agent, robots.txt, and IP-range changes for the major AI search crawlers — GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Claude-User, Claude-SearchBot, PerplexityBot, Perplexity-User, Google-Extended, and Bingbot — so infrastructure and SEO teams can keep robots.txt, WAF, and CDN rules aligned with current bot identities.

TL;DR

  • AI crawler identities change regularly: vendors split bots, bump version numbers, refresh IP allowlists, and quietly revise robots.txt language — usually without dedicated release notes.
  • OpenAI now operates three crawlers (GPTBot, OAI-SearchBot, ChatGPT-User) with overlapping but distinct robots.txt rules; ChatGPT-User's compliance language was narrowed in late 2025.
  • Anthropic clarified its three-bot model (ClaudeBot, Claude-User, Claude-SearchBot) in early 2026 and confirmed all three honor robots.txt, including the non-standard Crawl-delay directive.
  • Perplexity ships PerplexityBot and Perplexity-User, but Cloudflare has documented stealth crawling and rotating ASNs for some Perplexity traffic, so robots.txt alone is not sufficient.
  • Google-Extended (introduced 28 September 2023) is a robots.txt control token only; it shares user-agent strings with Googlebot and does not block AI Overviews.

Definition

The AI search bot changelog is a dated, vendor-by-vendor log of changes to the user-agent strings, robots.txt behavior, IP allowlists, and crawler roles of the bots that AI search products (ChatGPT, Claude, Perplexity, Gemini, Bing/Copilot) use to fetch web content. Unlike a traditional product changelog, AI crawler changes are scattered across vendor documentation pages, support articles, and announcement posts, and many changes are made silently — a string is updated, a new bot appears, or a robots.txt clause is rewritten without a public release note.

This reference consolidates those changes into per-bot tables so technical SEO and infrastructure teams can keep robots.txt, WAF rules, and CDN bot-management policies aligned with current bot identities. It is intentionally conservative: every entry is anchored to a primary vendor doc or a verifiable secondary source, and operational impact is described in plain language so the table can be read by both engineering and SEO stakeholders.

Why this matters

AI crawlers now account for a non-trivial share of bot traffic on most public sites, and their behavior diverges from the predictable patterns of search-engine crawlers. A robots.txt rule written when GPTBot launched in August 2023 will not, on its own, cover OAI-SearchBot, ChatGPT-User, or any of Anthropic's three Claude bots — all of which were either added later or had their compliance semantics renegotiated. Teams that do not track these changes typically run into three operational problems:

  1. Stale block lists. Rules target bots that have been renamed, split, or merged, so new traffic is not actually controlled.
  2. Accidental visibility loss. Blocking a bot meant for AI training (e.g., GPTBot) by reusing the same rule for a search-routing bot (e.g., OAI-SearchBot) can quietly remove the site from ChatGPT Search citations.
  3. WAF false positives. IP allowlists drift; verified Bingbot or GPTBot IP ranges expand, and outdated WAF rules either over-block or under-block.

A maintained changelog converts these silent changes into a reviewable artifact that SEO, security, and platform teams can audit at a regular cadence.

How it works

Each entry below uses the same shape: date, change, source, operational impact. Dates reflect when the change was published or first verifiable in vendor documentation, not when individual sites observed it. "Source" links the primary vendor doc whenever available; secondary sources (Search Engine Land, Search Engine Roundtable, Cloudflare, Bing Webmaster Tools) are used when no primary doc exists.

For bots that share a user-agent string across multiple roles — Google-Extended is the canonical example — the user-agent stays unchanged and the change is in the robots.txt token semantics. Those entries note the distinction explicitly so engineers do not waste time looking for a separate UA in their access logs.

The reference is grouped by vendor and then by bot, so you can scan only the rows that affect a single robots.txt block. Where a vendor publishes a verified IP allowlist, the URL of the JSON file is listed in the row's notes; treat those JSON files as the source of truth for WAF rules and refresh them on a schedule rather than hardcoding ranges.

Practical application

OpenAI: GPTBot

DateChangeSourceImpact
2023-08GPTBot launched. Initial UA contains GPTBot/1.0 and +https://openai.com/gptbot. Honors robots.txt.OpenAI launch announcementFirst AI training opt-out token site owners can deploy.
2024 (mid)UA bumped to GPTBot/1.2 (observed in production logs).OpenAI bots overviewPattern-match rules pinned to /1.0 exactly need to widen to GPTBot/.
2025 (late)Current UA in vendor docs is Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.3; +https://openai.com/gptbot. Verified IP list at openai.com/gptbot.json.OpenAI bots overviewUpdate WAF allowlists from the JSON file; do not hardcode IPs.
2025-Q4OpenAI clarifies that GPTBot and OAI-SearchBot may share crawl results, so a site allowing both may be visited by either to satisfy both use cases.Search Engine Roundtable, 2025Blocking only one of GPTBot/OAI-SearchBot may not fully isolate training-adjacent crawling.

OpenAI: OAI-SearchBot

DateChangeSourceImpact
2024-Q3OAI-SearchBot introduced as the bot powering ChatGPT Search. Honors robots.txt. Distinct token from GPTBot.OpenAI bots overviewSites wanting ChatGPT Search citations must allow OAI-SearchBot specifically.
2025-Q4Documentation clarifies OAI-SearchBot is no longer used to feed navigational links inside ChatGPT answers — blocking it does not strip those citation links.Search Engine Roundtable, 2025Citation strategies must account for the split between crawl source and answer-rendering surface.

OpenAI: ChatGPT-User

DateChangeSourceImpact
2023-09ChatGPT-User introduced for plugin-initiated browsing on behalf of a user. UA ChatGPT-User/1.0.OpenAI bots overviewTreated as user-triggered fetches rather than bulk crawl.
2025-Q4OpenAI narrows compliance language: robots.txt directives are described as applying to "OAI-SearchBot and GPTBot" rather than all three user agents. ChatGPT-User additionally cited as used for Custom GPTs and GPT Actions.Search Engine Roundtable, 2025Sites that relied on User-agent: ChatGPT-User to block live user fetches should re-test; those rules may now be advisory rather than authoritative.

Anthropic: ClaudeBot

DateChangeSourceImpact
2024-Q1ClaudeBot identified as Anthropic's training crawler. Honors robots.txt.Anthropic Help CenterFirst clear opt-out token for Anthropic training data.
2024-2025ClaudeBot traffic volume grew sharply on many sites; community reports describe sustained high request rates.Practitioner reports (typically observed in webdev community posts, 2025)Many teams added User-agent: ClaudeBot blocks or rate-limited at the CDN layer.
2026-Q1Anthropic refreshes crawler docs to formalize a three-bot model and confirm Crawl-delay is honored alongside standard robots.txt directives.Search Engine Land, 2026Crawl-delay becomes a valid throttling tool for ClaudeBot — useful where outright blocking is undesirable.

Anthropic: Claude-User

DateChangeSourceImpact
2026-Q1Claude-User documented as the bot that fetches pages when a Claude user explicitly requests them. Honors robots.txt.Search Engine Land, 2026Blocking Claude-User reduces visibility in user-directed Claude answers.

Anthropic: Claude-SearchBot

DateChangeSourceImpact
2026-Q1Claude-SearchBot documented as the bot that improves Claude search-result quality. Honors robots.txt.Anthropic Help CenterBlocking Claude-SearchBot reduces Claude search citation visibility but not training exposure.

Perplexity: PerplexityBot

DateChangeSourceImpact
2023PerplexityBot identified as Perplexity AI's index crawler. UA Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot).Perplexity crawler docsStandard robots.txt-honoring index bot for Perplexity citations.
2024-06Independent reports document Perplexity ignoring robots.txt disallow on at least some hosts.Robb Knight, 2024Surfaces that robots.txt cannot be relied on alone for Perplexity.
2025Cloudflare publishes evidence of Perplexity using stealth crawlers, rotating ASNs, and obscured user agents to evade no-crawl directives.Cloudflare, 2025Sites that need to block Perplexity should layer WAF/IP and behavioral rules on top of robots.txt.

Perplexity: Perplexity-User

DateChangeSourceImpact
2025Perplexity documents Perplexity-User as a distinct agent for live user-triggered queries, separate from PerplexityBot.Perplexity crawler docsSites wanting to remain visible in Perplexity answers should keep Perplexity-User allowed even if PerplexityBot is restricted.

Google: Google-Extended

DateChangeSourceImpact
2023-09-28Google-Extended introduced as a robots.txt token to opt out of training Bard/Gemini and Vertex AI grounding. No standalone HTTP user-agent — crawling continues with existing Googlebot UAs.Search Engine Land, 2023Site owners get an AI-training opt-out without losing Google Search visibility.
2024-2025Google clarifies that Google-Extended does not block AI Overviews; AI Overviews use the same content that powers Google Search.Marie Haynes, 2025Sites that disallow Google-Extended will still appear in AI Overviews if they remain in Google's regular index.
2025-Q4Google publishes Google-Extended details inside the common-crawlers reference, formalizing scope across Gemini Apps and Vertex AI grounding.Google for DevelopersSingle canonical source for current Google-Extended scope.

Microsoft: Bingbot

DateChangeSourceImpact
Pre-2023Bingbot operates as Bing's primary search crawler, UA family Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm). Verified IP list at bing.com/toolbox/bingbot.json.Bing Webmaster ToolsStandard search crawler that also feeds Bing's AI products.
2023-2024Bing's AI features (Bing Chat, then Copilot) draw on the existing Bing index — there is no separate "BingChat-Bot" UA.Bing Webmaster ToolsAllowing Bingbot is the prerequisite for visibility in Microsoft's AI search surfaces.
2025Bing reaffirms that robots.txt Crawl-delay overrides Webmaster Tools settings when both are present.Bing BlogsUse one mechanism, not both, to avoid conflicting throttling.

Common mistakes

  • Blocking GPTBot and assuming OpenAI is fully opted out. OAI-SearchBot and ChatGPT-User are separate tokens with separate robots.txt rules; some traffic types may also bypass robots.txt.
  • Treating Google-Extended like a normal crawler. It has no separate UA, so user-agent-based WAF rules cannot enforce it — the directive lives only in robots.txt.
  • Hardcoding IP allowlists. Vendors update openai.com/gptbot.json, bing.com/toolbox/bingbot.json, and equivalents; static lists rot within months.
  • Relying on robots.txt alone for Perplexity. Cloudflare-documented stealth-crawler behavior means network-layer enforcement is sometimes required.
  • Reusing one Crawl-delay rule across all bots. Only some bots (notably ClaudeBot per Anthropic's 2026 docs) honor Crawl-delay; Googlebot ignores it entirely.

FAQ

Q: How often should I update my AI crawler rules?

At minimum, review robots.txt and WAF rules quarterly, and re-pull verified IP allowlists from each vendor's JSON file at the same cadence. New bots and renamed tokens have shipped roughly every 6-9 months across major vendors since 2023.

Not by itself. ChatGPT Search uses OAI-SearchBot, which is a separate robots.txt token. However, OpenAI's late-2025 documentation notes that GPTBot and OAI-SearchBot may share crawl results, so blocking only one may not fully isolate the use cases.

Q: Why do some AI bots ignore robots.txt?

Some user-triggered agents (ChatGPT-User, Claude-User, Perplexity-User) are framed as fetching on behalf of a person, similar to a browser, and vendors argue that robots.txt does not strictly apply. Other cases — most prominently the Cloudflare-documented Perplexity stealth crawlers — appear to circumvent directives entirely and require network-layer blocking.

Q: Is there a single user-agent string that catches all AI bots?

No. There is no shared substring across GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and Bingbot. Reliable detection requires either (a) maintaining a list of explicit tokens or (b) verifying vendor IP ranges from the published JSON files.

Q: Does Google-Extended block AI Overviews?

No. Google-Extended controls training and grounding for Gemini Apps and Vertex AI, but AI Overviews continue to use content from the regular Google Search index. Removing your site from AI Overviews requires removing it from Google Search itself, which is rarely the desired outcome.

Q: How do I tell a real GPTBot or Bingbot from a spoofed one?

Verify the source IP against the vendor-published JSON allowlist (openai.com/gptbot.json, bing.com/toolbox/bingbot.json). User-agent strings can be forged trivially; verified IPs cannot.

Q: What's the safest default robots.txt posture for a documentation site that wants AI citations?

Allow GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Claude-User, Claude-SearchBot, PerplexityBot, Perplexity-User, and Bingbot; leave Google-Extended allowed if you want Gemini grounding and disallow only if you want to opt out of AI training. Layer rate limits at the CDN, not in robots.txt.

Related Articles

reference

AI Answer Length Patterns: Word and Token Targets per Engine in 2026

Reference for AI answer lengths in 2026 — word and token targets for ChatGPT, Perplexity, and Google AI Overviews so writers format extractable answers.

framework

AI Citation Confidence Scoring Framework: Predicting Source Inclusion Likelihood

AI citation confidence scoring framework: a predictive model that scores how likely generative engines are to cite a source based on retrieval, grounding, and trust signals.

specification

AI Citation Format Specification by Engine: How ChatGPT, Perplexity, Gemini, and Claude Render Sources in 2026

Reference specification of how ChatGPT, Perplexity, Gemini, and Claude render source citations in 2026, with format patterns, anchor text, and rendering rules.

Stay Updated

GEO & AI Search Insights

New articles, framework updates, and industry analysis. No spam, unsubscribe anytime.