Lazy Loading Patterns for AI Crawlers
AI crawlers like GPTBot, ClaudeBot, and PerplexityBot do not reliably trigger viewport-based lazy loading the way human browsers do. Use native loading="lazy" attributes for images and iframes, keep critical text in server-rendered HTML, and reserve IntersectionObserver patterns for non-citable below-the-fold media.
TL;DR
Lazy loading is a performance win for humans and a citation risk for AI crawlers. Native loading="lazy" is safe — the markup stays in the DOM and crawlers see the src. Custom JavaScript lazy loading is risky because not every AI bot fully renders JavaScript or scrolls the viewport. The safe pattern: server-render text content, native-lazy-load images, and only defer truly non-essential media. Always validate with a fetch-only HTML check, not just a DevTools view.
Why this matters for AI search
Google has documented for years that lazy loading "if not implemented correctly… can inadvertently hide content from Google" Google Search Central. The same caveat applies more strongly to AI-search crawlers. AI engines run a wide range of fetch and render pipelines: some (Google AI Overviews, Bing/Copilot) reuse mature rendering infrastructure, while others (GPTBot, ClaudeBot, PerplexityBot) often fetch HTML without a full headless browser, or render with budgets that don't include user-style scroll behaviour.
When lazy-loaded content depends on a scroll event, an IntersectionObserver callback, or an explicit "Load more" click, an AI crawler that doesn't reproduce that interaction will see an empty container. The cited claim is whatever happens to be in the initial HTML — your hero, nav, and CTA, not your detailed answer.
How AI crawlers handle deferred content
| Crawler | JS render | Scrolls viewport | Triggers IntersectionObserver |
|---|---|---|---|
| Googlebot (incl. AI Overviews) | Yes (Chromium) | Simulated | Sometimes |
| Bingbot / Copilot | Yes | Simulated | Sometimes |
| GPTBot | Limited | Rarely | Rarely |
| ClaudeBot | Limited | Rarely | Rarely |
| PerplexityBot | Hybrid (fetch + render) | Rarely | Rarely |
This table is directional, drawn from public crawl behaviour observations and bot operator guidance — vendors do not publish a precise capability matrix. Treat "Limited" / "Rarely" as: do not depend on it for any content you want cited.
Safe patterns
1. Native loading="lazy" for images and iframes
Use the HTML attribute, which is the WHATWG-spec lazy-loading attribute understood by every major browser:
<img src="/diagrams/answer-grounding.png"
loading="lazy"
width="800" height="450"
alt="Answer grounding pipeline diagram">The markup is fully present in HTML. Crawlers see the src and alt regardless of whether they trigger viewport loading. Reserve dimensions with width/height (or aspect-ratio CSS) so layout doesn't shift on render — INSIDEA recommends "Use standard attributes like loading="lazy" whenever possible. This built-in browser feature allows images and iframes to load efficiently without hiding them from crawlers" INSIDEA.
2. Server-render the text
Text that you want cited — definitions, FAQ answers, comparison tables — must exist in the initial HTML response. SSR (Next.js app router server components, Astro, Remix loaders, classic Rails / Django templates) all satisfy this. Hydration with React/Vue is fine as long as the text is already rendered server-side before client JS runs.
3. content-visibility: auto is acceptable for text
content-visibility: auto skips rendering work for off-screen elements but keeps the content in the DOM, where crawlers can still see it. Per web.dev, applying content-visibility: auto to chunked content can give a 7x rendering boost on initial load web.dev. It does not hide content from crawlers — the markup remains in the DOM.
4. IntersectionObserver for non-citable media
Background hero animations, decorative images, embedded ads — fine to defer with IntersectionObserver because they're not the citable atoms. Provide a fallback
Anti-patterns
- JS-only image insertion — generating
tags from a JavaScript array on scroll. AI crawlers that don't render or scroll see nothing.
- Infinite scroll without pagination URLs — a crawler can't trigger "scroll to bottom". Always provide a paginated ?page=N URL with full content per page, even if the human UI uses infinite scroll.
- display: none on "hidden" tabs that hold real content — the content remains in HTML, but Google has historically devalued or ignored content that's not visible by default; the same caution applies to AI crawlers.
- Hash-only routing — deferred content reachable only via #tab=specs won't be fetched by crawlers as a separate URL. Use real path segments.
- Click-to-reveal answer accordions for the *answer. The answer should be in the initial HTML; the accordion can hide it visually but the text should not require interaction to enter the DOM.
How to validate
- Fetch-only test. curl -A "GPTBot" https://your.site/page | grep "
". If the cited text isn't in the raw HTML, AI crawlers may miss it. - Google URL Inspection. Use the URL Inspection tool in Search Console; check the rendered HTML and search for your critical text — a quick way to confirm the lazy-loaded content reaches Google's index, as Glenn Gabe and Martin Splitt demonstrate in the Google Search Central "Lazy loading demystified" episode.
- Headless render comparison. Compare a --no-javascript Chromium fetch with a full render. Anything that only appears in the full render is at risk on JS-light AI crawlers.
- Bot impersonation. Tools like Screaming Frog, Sitebulb, and Crawl4AI's wait_for_images=True / scan_full_page modes can confirm what each bot persona sees.
- Server log review. Filter access logs by GPTBot / ClaudeBot / PerplexityBot user agents to confirm they're fetching the URL at all and that response sizes look right (small responses often signal content gaps).
Implementation checklist
- [ ] Citable text rendered server-side, not via client-only fetch.
- [ ] All
and
- [ ] Critical above-the-fold media is not lazy-loaded (would hurt LCP).
- [ ] Infinite scroll has a parallel paginated URL structure.
- [ ] No critical text inside display: none tabs by default.
- [ ] content-visibility: auto applied only to long-form sections, not entire pages.
- [ ] Server logs confirm AI bots fetch the URL and receive a full HTML body.
Common mistakes
- Assuming "Google sees it = AI sees it". Google's renderer is among the most capable. Don't extrapolate from Google success to GPTBot/ClaudeBot success.
- Lazy-loading hero images. Hurts LCP and gives crawlers a blank impression of the page.
- Blocking lazy-loader scripts in robots.txt. If your custom lazy loader is blocked, even capable crawlers can't run it.
- Forgetting
FAQ
Q: Does Google index lazy-loaded images?
Yes, when implemented correctly. Google indexes content it can successfully render, including lazy-loaded images that use native loading="lazy" or correctly-instrumented JavaScript lazy loading Google Search Central. Verify with the URL Inspection tool.
Q: Should I lazy-load my hero image?
No. The hero image is almost always the LCP element; lazy-loading it delays the largest contentful paint and can suppress its appearance in AI Overview / Perplexity image cards. Mark it eagerly loaded (loading="eager" or omit the attribute).
Q: Is content-visibility: hidden safe for SEO?
It's risky. content-visibility: hidden skips rendering and is not visible by default. While the content is in the DOM, Lighthouse audits have historically had issues introspecting subtrees with content-visibility: hidden, and search engines generally devalue content that users don't see by default. Use content-visibility: auto instead.
Q: Do GPTBot and ClaudeBot scroll the page?
They are not documented to fully simulate human scroll. Treat any content gated behind scroll-trigger as invisible to those bots and test with a fetch-only comparison.
Q: How do I lazy-load infinite scroll without losing AI-cited content?
Provide paginated URLs (/articles?page=2) alongside the infinite-scroll UX. Surface those URLs in the sitemap so AI crawlers can fetch each page's full content directly.
Related Articles
AI Crawler Content Negotiation Specification
HTTP content negotiation (Accept, Accept-Language, Vary) for AI crawlers — serve LLM-friendly variants without cloaking penalties or cache poisoning.
AI Crawler Prefetch Hints Specification
How to use Resource Hints, Link headers, and 103 Early Hints to accelerate AI crawler discovery while keeping origin load and crawl budget under control.
How to Create llms.txt: Step-by-Step Tutorial for AI Search
Step-by-step tutorial for creating, deploying, and validating an llms.txt file so AI systems and LLMs can discover your site's most important content.