JavaScript SPA Hydration Patterns for AI Crawlers

Most AI search crawlers (GPTBot, ClaudeBot, PerplexityBot) fetch only the initial HTML and do not execute JavaScript. Single-page applications must server-render or prerender critical content and avoid hydration mismatches; otherwise their content is invisible to AI search.

TL;DR

Server-render or prerender every page you want AI crawlers to cite. Keep the SSR HTML stable and identical across server and client to avoid hydration mismatches that strip content. Wrap browser-only logic in effects, not render. Use ISR/SSG for static-friendly routes, full SSR for dynamic ones, and reserve client-only rendering for authenticated dashboards. Audit each route by curling it with User-Agent: GPTBot.

Why this matters

Vercel's empirical study found that the top AI crawlers — OpenAI's GPTBot, OAI-SearchBot, ChatGPT-User; Anthropic's ClaudeBot; Perplexity's PerplexityBot; ByteDance's Bytespider; and Meta-ExternalAgent — do not execute JavaScript (Vercel, 2024). They issue a single fetch, parse the response, and move on. Bingbot has limited JavaScript support per practitioner reports, and only Googlebot performs full rendering, with its own queue and timing caveats (Vercel + MERJ, 2024).

This means an SPA whose initial HTML is

is invisible to most of the AI ecosystem regardless of how good your content is.

Rendering modes at a glance

flowchart LR
    A["Request"] --> B{"Rendering
strategy"}
    B -->|"CSR"| C["Empty shell HTML
+ JS bundle"]
    B -->|"SSR"| D["Full HTML on first response"]
    B -->|"SSG / ISR"| E["Pre-built static HTML"]
    B -->|"Streaming SSR"| F["HTML streamed in chunks"]
    C --> X["AI crawlers see nothing"]
    D --> Y["AI crawlers see content"]
    E --> Y
    F --> Y

CSR (client-side render) — only safe for authenticated apps. Content invisible to non-Google AI crawlers.
SSR (server-side render) — safest for AI; full HTML in the first response.
SSG (static site generation) — cheapest at scale; ideal for docs and marketing.
ISR (incremental static regeneration) — SSG with periodic re-builds; good for content sites.
Streaming SSR — modern React/Remix; safe for AI as long as the critical content is in the early flush.

Hydration mismatch failures

A hydration mismatch occurs when the HTML the server emitted differs from what the client renders during React/Vue/Svelte hydration. For AI crawlers that never run JS, the server HTML is the only signal — mismatches do not affect visibility directly. But mismatches often indicate code paths that strip or replace SSR content during hydration, and those replacements are what users and rendering crawlers (Googlebot) end up indexing.

Common causes (Next.js docs, 2024):

Reading window, localStorage, or document during render.
Using Date.now() or Math.random() in render output.
Conditional rendering driven by user-agent sniffing.
Third-party scripts injecting attributes into the DOM (browser extensions, A/B test scripts).
CSS-in-JS libraries not configured for SSR.

Mitigations:

Wrap browser-only logic in useEffect (React) or onMounted (Vue) so it runs only after hydration.
Use useId / nanoid-from-state for stable IDs across server and client.
Suppress third-party DOM mutations until after first render.
Configure your CSS-in-JS library's SSR adapter (e.g., styled-components ServerStyleSheet, Emotion extractCritical).

Framework-specific patterns

Next.js (App Router)

Default to Server Components. Only mark "use client" for genuinely interactive subtrees.
Wrap browser-only components with dynamic(() => import(...), { ssr: false }) and provide a server-rendered fallback that contains the indexable text.
For data fetching, prefer Server Components or generateStaticParams over client-side useEffect.
Use revalidate for ISR; targets vary by content freshness.

Nuxt 3 / Vue

Use useAsyncData or useFetch so data is rendered on the server.
Avoid process.client-gated rendering of primary content.
Set ssr: true (default) in nuxt.config. For routes that must be CSR-only, prerender a representative page for AI crawlers.
Reach for only for non-content widgets.

SvelteKit

Default ssr: true in +page.ts / +layout.ts.
Use load functions for data; never fetch primary content from onMount.
For static-friendly routes, configure prerender = true.

Astro

Astro produces zero-JS HTML by default, ideal for AI crawlers.
Use island architecture: ship JS only for components that need it.
For dynamic data, use SSR adapters (@astrojs/node, @astrojs/cloudflare).

Patterns that survive AI parsing

HTML-first authoring. Write the page as if no JS will run. Add interactivity on top.
Critical content in the first flush. With streaming SSR, ensure the headline, first paragraph, and key facts are above any boundary that could delay them.
Stable text content. Avoid client-only locale formatting on critical text — use server-side Intl.*.
No content behind interaction. Tabs, accordions, modals: render the content in the DOM (visually hidden if needed), do not lazy-mount it.
Avoid empty layouts during build. SSG outputs that depend on useEffect-fetched data emit empty HTML; move the fetch to getStaticProps / load / equivalent.

Verifying AI visibility

curl -A "Mozilla/5.0 (compatible; GPTBot/1.0; +https://openai.com/gptbot)" 
  https://example.com/your-route 
  | grep -c "your headline text"

A non-zero count means the headline is in the SSR HTML. Repeat with ClaudeBot, PerplexityBot, and Googlebot user-agents. For programmatic auditing, integrate this into CI for top routes.

Common mistakes

Treating Googlebot's JS rendering as a stand-in for all AI crawlers.
Lazy-mounting article body via IntersectionObserver — invisible to AI.
Returning 200 OK with empty HTML for not-yet-prerendered routes; AI crawlers do not retry.
Blocking AI bots in robots.txt while debugging, then forgetting to unblock.
Relying on client-side analytics scripts for canonical URL injection.

FAQ

Q: Do AI crawlers run JavaScript at all?

The major non-Google AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Bytespider, Meta-ExternalAgent) do not. Bingbot has limited support per practitioner reports. Googlebot does, but with rendering queues that can delay indexing.

Q: Is CSR ever acceptable for public content?

Only when SEO and AI citation are not goals (e.g., authenticated dashboards). For any page you want cited, default to SSR, ISR, or SSG.

Q: What if I cannot SSR (legacy app)?

Use a prerender service (Prerender.io, Rendertron) or an edge worker that returns static HTML for known crawler user-agents. Long-term, migrate to a framework with native SSR.

Q: Do hydration mismatches cause de-indexing?

Not directly for crawlers that never run JS. For Googlebot they can degrade INP and reliability. They are also a strong code-smell that the same component branches strip content client-side, which is harmful.

Q: Should I detect AI crawlers and serve different content?

No. Cloaking risks penalties. Serve the same SSR HTML to everyone; differentiate only in performance hints (e.g., skip heavy ad scripts for known bot user-agents).

Q: Does streaming SSR work for AI crawlers?

Yes, when the critical content is in the early flush. Wrap non-critical sections in so they do not block the first response.