Geodocs.dev

Screaming Frog for GEO Auditing

ShareLinkedIn

Open this article in your favorite AI assistant for deeper analysis, summaries, or follow-up questions.

Screaming Frog SEO Spider audits GEO readiness by crawling a site and using custom extraction (XPath, CSSPath, regex, and JavaScript snippets) to check JSON-LD schema coverage, heading hierarchy, AI summary blocks, FAQ extractability, and internal linking at scale.

TL;DR. Screaming Frog is the most practical desktop crawler for site-wide Generative Engine Optimization audits. By layering custom extractions on top of the default crawl, you can score every URL on the structural signals AI search engines depend on — schema presence, answer-first headings, FAQ blocks, and clean internal links — and then prioritise fixes by traffic-weighted impact.

Why Screaming Frog still matters for GEO

GEO audits do not need a brand-new toolchain. Most of the signals that influence whether a page can be cited by ChatGPT, Perplexity, Google AI Overviews, or Gemini are HTML-level: heading hierarchy, structured data, FAQ blocks, internal links, and metadata. Screaming Frog SEO Spider has crawled those signals for over a decade and now ships features specifically aimed at AI search work, including custom JavaScript snippets and an internal AI search optimisation service.

Use it when you need to:

  • Inventory every URL on a site and score it on GEO readiness.
  • Detect missing or malformed JSON-LD across the entire crawl.
  • Find pages that lack an extractable answer block (AI summary, FAQ, definition).
  • Identify orphan or under-linked hub pages that AI crawlers will miss.
  • Generate or normalise JSON-LD at scale via custom JavaScript snippets.

Set up the crawl

The defaults are tuned for traditional SEO; GEO work needs a few extra toggles.

  1. Enable JavaScript rendering. Configuration → Spider → Rendering → JavaScript. Set the rendering wait time to 3-5 seconds so client-rendered JSON-LD and content load before extraction.
  2. Crawl structured data. Configuration → Spider → Extraction → enable JSON-LD, Microdata, RDFa, and Schema.org Validation.
  3. Add custom extractions (described below).
  4. Respect robots and crawl budget, but raise the connection limit if the site can handle it; a typical GEO audit needs the full URL inventory.
  5. Set a custom user-agent if you want to test how the site responds to AI crawlers (for example GPTBot, PerplexityBot, ClaudeBot).

Core GEO checks (no custom code required)

These checks come from the default reports in SEO Spider.

Heading hierarchy

CheckWhere to look
Multiple H1sH1 tab → filter "Multiple"
Missing H1H1 tab → filter "Missing"
Long or short H1H1 tab → sort by length
H2 / H3 inventoryH2 tab → review for skipped levels

AI extractors read headings as the page's outline. Skipped levels (H1 → H3) and missing H1s reduce the chance of clean passage extraction.

Metadata

CheckWhere to look
Meta description lengthMeta Description tab → filter on length
Missing descriptionsMeta Description tab → filter "Missing"
Duplicate descriptionsMeta Description tab → filter "Duplicate"
Title length and uniquenessPage Titles tab

Keep descriptions in the 120-160 character range so they are usable as og:description, meta description, and llm_summary fallbacks.

Internal linking and orphans

CheckWhere to look
Orphan pagesCrawl Analysis → Orphan Pages report
Internal link countInlinks column on the Internal tab
Broken internal linksResponse Codes → 4xx
Hub link presenceCustom search for the hub URL (see below)

AI systems weight pages that are well-linked from authoritative hub pages. Pages with low inlinks and no hub reference are systematically underrepresented in citations.

Structured data

The Structured Data tab surfaces JSON-LD detected on each URL plus Schema.org validation warnings. For GEO, prioritise:

  • Article, BlogPosting, TechArticle for editorial content
  • FAQPage and HowTo for instructional pages
  • Product, Offer, Review for commerce
  • Organization, Person, and WebSite for entity signals

Pages without any JSON-LD should be flagged for follow-up; the Schema.org validation column highlights syntax errors and missing required properties.

Custom extractions for GEO signals

Custom extraction is the highest-leverage feature for GEO audits because it lets you scrape any HTML pattern and report it across the entire crawl. SEO Spider supports XPath, CSSPath, regex, and custom JavaScript snippets in recent versions.

1. Detect AI summary blocks

If your style guide places AI summaries in a

element, extract them with CSSPath:

  • Name: AI Summary Block
  • Type: CSSPath
  • Expression: blockquote.ai-summary
  • Extractor: Extract Text

Pages that return empty for this extraction are missing the answer-first block AI engines tend to lift first.

2. Detect FAQ blocks

Look for FAQPage schema with regex:

  • Name: FAQ Schema
  • Type: Regex
  • Expression: "@type"s:s"FAQPage"

3. Extract @type values from JSON-LD

  • Name: JSON-LD Types
  • Type: Regex
  • Expression: "@type"s:s"(+)"

The result column lists every schema type the page declares, making it easy to spot pages that declare only generic WebPage and nothing more useful.

  • Name: Links to /tools hub
  • Type: Regex
  • Expression: href="(?:https?://+)?/tools/?"

Run a separate extraction per hub URL and you get a per-page boolean for hub linkage — one of the GEO content checklist requirements.

5. Generate or normalise JSON-LD at scale

The custom JavaScript snippets feature lets you generate or normalise JSON-LD based on extracted page data, then write the markup back into the export for engineering to deploy. This is useful when retrofitting schema across hundreds of legacy pages without involving the CMS.

Scoring framework

Once the crawl is complete, export everything to a spreadsheet and score each URL on a simple GEO readiness rubric. A workable starting weight set:

PillarWeightPass criteria
Single H1, sane H2/H3 hierarchy15%Default Screaming Frog checks pass
JSON-LD present and valid20%At least one relevant @type, no validation errors
AI summary block15%CSSPath extraction returns text
FAQ block or FAQPage schema10%Either present
Description 120-160 chars10%Length in range
Hub link present10%Hub regex matches
Inlinks ≥ section median10%Not orphaned or thin
No 4xx internal links10%None reported

Multiply each page's score by its impressions or sessions to get a prioritised remediation backlog.

Audit workflow end-to-end

  1. Configure the crawl (JavaScript rendering on, structured data extraction on, custom extractions added).
  2. Crawl the production site or a representative sample.
  3. Run Crawl Analysis to populate orphan and link score reports.
  4. Export Internal, Structured Data, and Custom Extraction tabs to spreadsheets.
  5. Score each URL with the rubric above.
  6. Prioritise by score gap × traffic value.
  7. Implement fixes in your CMS or templates.
  8. Re-crawl to confirm the fixes resolved the flagged issues.

Common mistakes

  • Crawling without JavaScript rendering. Many sites inject JSON-LD client-side; without rendering, the structured data tab will look empty.
  • Treating Screaming Frog as the only data source. Combine the crawl with Google Search Console (impressions, AI features), GA4 or Plausible (traffic), and a citation tracker for AI platforms before deciding what to fix first.
  • Auditing once. GEO drift is real: schema breaks, summaries get edited out, hub links rot. Re-crawl on the same review cycle as your content refresh.

Frequently asked questions

Q: Can Screaming Frog tell me if my page is being cited by AI engines?

No. Screaming Frog audits the on-page signals that make a page citable. To measure actual citations you need a separate AI visibility tracker or manual prompt testing across ChatGPT, Perplexity, Gemini, and Google AI Overviews. Use Screaming Frog to fix the structural issues and the tracker to confirm impact.

Q: Do I need the paid licence?

The free version crawls a limited number of URLs and disables custom extraction, scheduling, and JavaScript rendering. For any real GEO audit you will need a paid licence. Check the Screaming Frog pricing page for the current annual cost; it is generally cheaper than a single competing SaaS subscription for the same depth of crawl.

Q: How does this differ from a Lighthouse or PageSpeed audit?

Lighthouse audits a single URL for performance, accessibility, SEO basics, and best practices. Screaming Frog audits the entire site for structural and link signals. For GEO you want both: Lighthouse on representative templates and Screaming Frog across the inventory.

Q: Which custom extractions matter most for AI Overviews specifically?

Prioritise AI summary blocks, FAQ blocks, JSON-LD @type coverage, and hub-link presence. Google has stated there are no AI-specific optimisation requirements, but its own documentation still emphasises structured content and clear answer blocks for AI features in Search.

Sources

  • Screaming Frog. "Generate JSON-LD Schema at Scale With JavaScript Snippets." February 2025. https://www.screamingfrog.co.uk/blog/generate-json-ld-schema-at-scale/
  • Screaming Frog. "Web Scraping & Custom Extraction." https://www.screamingfrog.co.uk/seo-spider/tutorials/web-scraping/
  • TruPerformance. "Generating JSON-LD Schema at Scale with Screaming Frog's JavaScript Rendering." February 2025. https://www.truperformance.us/resources/blogs/generating-json-ld-schema/
  • Google Search Central. "AI Features and Your Website." https://developers.google.com/search/docs/appearance/ai-features
  • iPullRank. "The Measurement Chasm: Tracking GEO Performance." 2025. https://ipullrank.com/ai-search-manual/measurement-geo

Related Articles

checklist

GEO Audit Checklist: A 50-Point AI Search Readiness Assessment

A 50-point GEO audit checklist to evaluate your site's readiness for AI search visibility across content, technical, authority, and AI-readability signals.

guide

HTML Semantic Structure for AI Readability

Use HTML5 semantic elements like article, section, nav, and proper heading hierarchy to improve AI crawler extraction and citation probability.

guide

Structured Data for AI Search

How to implement structured data (JSON-LD / Schema.org) to improve AI search visibility. Covers TechArticle, FAQPage, HowTo, and entity definitions.

Stay Updated

GEO & AI Search Insights

New articles, framework updates, and industry analysis. No spam, unsubscribe anytime.