Screaming Frog for GEO Auditing
Screaming Frog SEO Spider audits GEO readiness by crawling a site and using custom extraction (XPath, CSSPath, regex, and JavaScript snippets) to check JSON-LD schema coverage, heading hierarchy, AI summary blocks, FAQ extractability, and internal linking at scale.
TL;DR. Screaming Frog is the most practical desktop crawler for site-wide Generative Engine Optimization audits. By layering custom extractions on top of the default crawl, you can score every URL on the structural signals AI search engines depend on — schema presence, answer-first headings, FAQ blocks, and clean internal links — and then prioritise fixes by traffic-weighted impact.
Why Screaming Frog still matters for GEO
GEO audits do not need a brand-new toolchain. Most of the signals that influence whether a page can be cited by ChatGPT, Perplexity, Google AI Overviews, or Gemini are HTML-level: heading hierarchy, structured data, FAQ blocks, internal links, and metadata. Screaming Frog SEO Spider has crawled those signals for over a decade and now ships features specifically aimed at AI search work, including custom JavaScript snippets and an internal AI search optimisation service.
Use it when you need to:
- Inventory every URL on a site and score it on GEO readiness.
- Detect missing or malformed JSON-LD across the entire crawl.
- Find pages that lack an extractable answer block (AI summary, FAQ, definition).
- Identify orphan or under-linked hub pages that AI crawlers will miss.
- Generate or normalise JSON-LD at scale via custom JavaScript snippets.
Set up the crawl
The defaults are tuned for traditional SEO; GEO work needs a few extra toggles.
- Enable JavaScript rendering. Configuration → Spider → Rendering → JavaScript. Set the rendering wait time to 3-5 seconds so client-rendered JSON-LD and content load before extraction.
- Crawl structured data. Configuration → Spider → Extraction → enable JSON-LD, Microdata, RDFa, and Schema.org Validation.
- Add custom extractions (described below).
- Respect robots and crawl budget, but raise the connection limit if the site can handle it; a typical GEO audit needs the full URL inventory.
- Set a custom user-agent if you want to test how the site responds to AI crawlers (for example GPTBot, PerplexityBot, ClaudeBot).
Core GEO checks (no custom code required)
These checks come from the default reports in SEO Spider.
Heading hierarchy
| Check | Where to look |
|---|---|
| Multiple H1s | H1 tab → filter "Multiple" |
| Missing H1 | H1 tab → filter "Missing" |
| Long or short H1 | H1 tab → sort by length |
| H2 / H3 inventory | H2 tab → review for skipped levels |
AI extractors read headings as the page's outline. Skipped levels (H1 → H3) and missing H1s reduce the chance of clean passage extraction.
Metadata
| Check | Where to look |
|---|---|
| Meta description length | Meta Description tab → filter on length |
| Missing descriptions | Meta Description tab → filter "Missing" |
| Duplicate descriptions | Meta Description tab → filter "Duplicate" |
| Title length and uniqueness | Page Titles tab |
Keep descriptions in the 120-160 character range so they are usable as og:description, meta description, and llm_summary fallbacks.
Internal linking and orphans
| Check | Where to look |
|---|---|
| Orphan pages | Crawl Analysis → Orphan Pages report |
| Internal link count | Inlinks column on the Internal tab |
| Broken internal links | Response Codes → 4xx |
| Hub link presence | Custom search for the hub URL (see below) |
AI systems weight pages that are well-linked from authoritative hub pages. Pages with low inlinks and no hub reference are systematically underrepresented in citations.
Structured data
The Structured Data tab surfaces JSON-LD detected on each URL plus Schema.org validation warnings. For GEO, prioritise:
- Article, BlogPosting, TechArticle for editorial content
- FAQPage and HowTo for instructional pages
- Product, Offer, Review for commerce
- Organization, Person, and WebSite for entity signals
Pages without any JSON-LD should be flagged for follow-up; the Schema.org validation column highlights syntax errors and missing required properties.
Custom extractions for GEO signals
Custom extraction is the highest-leverage feature for GEO audits because it lets you scrape any HTML pattern and report it across the entire crawl. SEO Spider supports XPath, CSSPath, regex, and custom JavaScript snippets in recent versions.
1. Detect AI summary blocks
If your style guide places AI summaries in a
element, extract them with CSSPath:
- Name: AI Summary Block
- Type: CSSPath
- Expression: blockquote.ai-summary
- Extractor: Extract Text
Pages that return empty for this extraction are missing the answer-first block AI engines tend to lift first.
2. Detect FAQ blocks
Look for FAQPage schema with regex:
- Name: FAQ Schema
- Type: Regex
- Expression: "@type"s:s"FAQPage"
3. Extract @type values from JSON-LD
- Name: JSON-LD Types
- Type: Regex
- Expression: "@type"s:s"(+)"
The result column lists every schema type the page declares, making it easy to spot pages that declare only generic WebPage and nothing more useful.
4. Detect hub links
- Name: Links to /tools hub
- Type: Regex
- Expression: href="(?:https?://+)?/tools/?"
Run a separate extraction per hub URL and you get a per-page boolean for hub linkage — one of the GEO content checklist requirements.
5. Generate or normalise JSON-LD at scale
The custom JavaScript snippets feature lets you generate or normalise JSON-LD based on extracted page data, then write the markup back into the export for engineering to deploy. This is useful when retrofitting schema across hundreds of legacy pages without involving the CMS.
Scoring framework
Once the crawl is complete, export everything to a spreadsheet and score each URL on a simple GEO readiness rubric. A workable starting weight set:
Pillar Weight Pass criteria Single H1, sane H2/H3 hierarchy 15% Default Screaming Frog checks pass JSON-LD present and valid 20% At least one relevant @type, no validation errors AI summary block 15% CSSPath extraction returns text FAQ block or FAQPage schema 10% Either present Description 120-160 chars 10% Length in range Hub link present 10% Hub regex matches Inlinks ≥ section median 10% Not orphaned or thin No 4xx internal links 10% None reported Multiply each page's score by its impressions or sessions to get a prioritised remediation backlog.
Audit workflow end-to-end
- Configure the crawl (JavaScript rendering on, structured data extraction on, custom extractions added).
- Crawl the production site or a representative sample.
- Run Crawl Analysis to populate orphan and link score reports.
- Export Internal, Structured Data, and Custom Extraction tabs to spreadsheets.
- Score each URL with the rubric above.
- Prioritise by score gap × traffic value.
- Implement fixes in your CMS or templates.
- Re-crawl to confirm the fixes resolved the flagged issues.
Common mistakes
- Crawling without JavaScript rendering. Many sites inject JSON-LD client-side; without rendering, the structured data tab will look empty.
- Treating Screaming Frog as the only data source. Combine the crawl with Google Search Console (impressions, AI features), GA4 or Plausible (traffic), and a citation tracker for AI platforms before deciding what to fix first.
- Auditing once. GEO drift is real: schema breaks, summaries get edited out, hub links rot. Re-crawl on the same review cycle as your content refresh.
Frequently asked questions
Q: Can Screaming Frog tell me if my page is being cited by AI engines?
No. Screaming Frog audits the on-page signals that make a page citable. To measure actual citations you need a separate AI visibility tracker or manual prompt testing across ChatGPT, Perplexity, Gemini, and Google AI Overviews. Use Screaming Frog to fix the structural issues and the tracker to confirm impact.
Q: Do I need the paid licence?
The free version crawls a limited number of URLs and disables custom extraction, scheduling, and JavaScript rendering. For any real GEO audit you will need a paid licence. Check the Screaming Frog pricing page for the current annual cost; it is generally cheaper than a single competing SaaS subscription for the same depth of crawl.
Q: How does this differ from a Lighthouse or PageSpeed audit?
Lighthouse audits a single URL for performance, accessibility, SEO basics, and best practices. Screaming Frog audits the entire site for structural and link signals. For GEO you want both: Lighthouse on representative templates and Screaming Frog across the inventory.
Q: Which custom extractions matter most for AI Overviews specifically?
Prioritise AI summary blocks, FAQ blocks, JSON-LD @type coverage, and hub-link presence. Google has stated there are no AI-specific optimisation requirements, but its own documentation still emphasises structured content and clear answer blocks for AI features in Search.
Sources
- Screaming Frog. "Generate JSON-LD Schema at Scale With JavaScript Snippets." February 2025. https://www.screamingfrog.co.uk/blog/generate-json-ld-schema-at-scale/
- Screaming Frog. "Web Scraping & Custom Extraction." https://www.screamingfrog.co.uk/seo-spider/tutorials/web-scraping/
- TruPerformance. "Generating JSON-LD Schema at Scale with Screaming Frog's JavaScript Rendering." February 2025. https://www.truperformance.us/resources/blogs/generating-json-ld-schema/
- Google Search Central. "AI Features and Your Website." https://developers.google.com/search/docs/appearance/ai-features
- iPullRank. "The Measurement Chasm: Tracking GEO Performance." 2025. https://ipullrank.com/ai-search-manual/measurement-geo
Related Articles
GEO Audit Checklist: A 50-Point AI Search Readiness Assessment
A 50-point GEO audit checklist to evaluate your site's readiness for AI search visibility across content, technical, authority, and AI-readability signals.
HTML Semantic Structure for AI Readability
Use HTML5 semantic elements like article, section, nav, and proper heading hierarchy to improve AI crawler extraction and citation probability.
Structured Data for AI Search
How to implement structured data (JSON-LD / Schema.org) to improve AI search visibility. Covers TechArticle, FAQPage, HowTo, and entity definitions.