Screaming Frog for GEO Auditing

Screaming Frog SEO Spider audits GEO readiness by crawling a site and using custom extraction (XPath, CSSPath, regex, and JavaScript snippets) to check JSON-LD schema coverage, heading hierarchy, AI summary blocks, FAQ extractability, and internal linking at scale.

TL;DR. Screaming Frog is the most practical desktop crawler for site-wide Generative Engine Optimization audits. By layering custom extractions on top of the default crawl, you can score every URL on the structural signals AI search engines depend on — schema presence, answer-first headings, FAQ blocks, and clean internal links — and then prioritise fixes by traffic-weighted impact.

Why Screaming Frog still matters for GEO

GEO audits do not need a brand-new toolchain. Most of the signals that influence whether a page can be cited by ChatGPT, Perplexity, Google AI Overviews, or Gemini are HTML-level: heading hierarchy, structured data, FAQ blocks, internal links, and metadata. Screaming Frog SEO Spider has crawled those signals for over a decade and now ships features specifically aimed at AI search work, including custom JavaScript snippets and an internal AI search optimisation service.

Use it when you need to:

Inventory every URL on a site and score it on GEO readiness.
Detect missing or malformed JSON-LD across the entire crawl.
Find pages that lack an extractable answer block (AI summary, FAQ, definition).
Identify orphan or under-linked hub pages that AI crawlers will miss.
Generate or normalise JSON-LD at scale via custom JavaScript snippets.

Set up the crawl

The defaults are tuned for traditional SEO; GEO work needs a few extra toggles.

Enable JavaScript rendering. Configuration → Spider → Rendering → JavaScript. Set the rendering wait time to 3-5 seconds so client-rendered JSON-LD and content load before extraction.
Crawl structured data. Configuration → Spider → Extraction → enable JSON-LD, Microdata, RDFa, and Schema.org Validation.
Add custom extractions (described below).
Respect robots and crawl budget, but raise the connection limit if the site can handle it; a typical GEO audit needs the full URL inventory.
Set a custom user-agent if you want to test how the site responds to AI crawlers (for example GPTBot, PerplexityBot, ClaudeBot).

Core GEO checks (no custom code required)

These checks come from the default reports in SEO Spider.

Heading hierarchy

Check	Where to look
Multiple H1s	H1 tab → filter "Multiple"
Missing H1	H1 tab → filter "Missing"
Long or short H1	H1 tab → sort by length
H2 / H3 inventory	H2 tab → review for skipped levels

AI extractors read headings as the page's outline. Skipped levels (H1 → H3) and missing H1s reduce the chance of clean passage extraction.

Metadata

Check	Where to look
Meta description length	Meta Description tab → filter on length
Missing descriptions	Meta Description tab → filter "Missing"
Duplicate descriptions	Meta Description tab → filter "Duplicate"
Title length and uniqueness	Page Titles tab

Keep descriptions in the 120-160 character range so they are usable as og:description, meta description, and llm_summary fallbacks.

Internal linking and orphans

Check	Where to look
Orphan pages	Crawl Analysis → Orphan Pages report
Internal link count	Inlinks column on the Internal tab
Broken internal links	Response Codes → 4xx
Hub link presence	Custom search for the hub URL (see below)

AI systems weight pages that are well-linked from authoritative hub pages. Pages with low inlinks and no hub reference are systematically underrepresented in citations.

Structured data

The Structured Data tab surfaces JSON-LD detected on each URL plus Schema.org validation warnings. For GEO, prioritise:

Article, BlogPosting, TechArticle for editorial content
FAQPage and HowTo for instructional pages
Product, Offer, Review for commerce
Organization, Person, and WebSite for entity signals

Pages without any JSON-LD should be flagged for follow-up; the Schema.org validation column highlights syntax errors and missing required properties.

Custom extractions for GEO signals

Custom extraction is the highest-leverage feature for GEO audits because it lets you scrape any HTML pattern and report it across the entire crawl. SEO Spider supports XPath, CSSPath, regex, and custom JavaScript snippets in recent versions.

1. Detect AI summary blocks

If your style guide places AI summaries in a

element, extract them with CSSPath:

Name: AI Summary Block

Type: CSSPath

Expression: blockquote.ai-summary

Extractor: Extract Text

Pages that return empty for this extraction are missing the answer-first block AI engines tend to lift first.

2. Detect FAQ blocks

Look for FAQPage schema with regex:

Name: FAQ Schema

Type: Regex

Expression: "@type"s:s"FAQPage"

3. Extract @type values from JSON-LD

Name: JSON-LD Types

Type: Regex

Expression: "@type"s:s"(+)"

The result column lists every schema type the page declares, making it easy to spot pages that declare only generic WebPage and nothing more useful.

4. Detect hub links

Name: Links to /tools hub

Type: Regex

Expression: href="(?:https?://+)?/tools/?"

Run a separate extraction per hub URL and you get a per-page boolean for hub linkage — one of the GEO content checklist requirements.

5. Generate or normalise JSON-LD at scale

The custom JavaScript snippets feature lets you generate or normalise JSON-LD based on extracted page data, then write the markup back into the export for engineering to deploy. This is useful when retrofitting schema across hundreds of legacy pages without involving the CMS.

Scoring framework

Once the crawl is complete, export everything to a spreadsheet and score each URL on a simple GEO readiness rubric. A workable starting weight set:

Pillar Weight Pass criteria
Single H1, sane H2/H3 hierarchy 15% Default Screaming Frog checks pass
JSON-LD present and valid 20% At least one relevant @type, no validation errors
AI summary block 15% CSSPath extraction returns text
FAQ block or FAQPage schema 10% Either present
Description 120-160 chars 10% Length in range
Hub link present 10% Hub regex matches
Inlinks ≥ section median 10% Not orphaned or thin
No 4xx internal links 10% None reported

Multiply each page's score by its impressions or sessions to get a prioritised remediation backlog.

Audit workflow end-to-end

Configure the crawl (JavaScript rendering on, structured data extraction on, custom extractions added).

Crawl the production site or a representative sample.

Run Crawl Analysis to populate orphan and link score reports.

Export Internal, Structured Data, and Custom Extraction tabs to spreadsheets.

Score each URL with the rubric above.

Prioritise by score gap × traffic value.

Implement fixes in your CMS or templates.

Re-crawl to confirm the fixes resolved the flagged issues.

Common mistakes

Crawling without JavaScript rendering. Many sites inject JSON-LD client-side; without rendering, the structured data tab will look empty.

Treating Screaming Frog as the only data source. Combine the crawl with Google Search Console (impressions, AI features), GA4 or Plausible (traffic), and a citation tracker for AI platforms before deciding what to fix first.

Auditing once. GEO drift is real: schema breaks, summaries get edited out, hub links rot. Re-crawl on the same review cycle as your content refresh.

Frequently asked questions

Q: Can Screaming Frog tell me if my page is being cited by AI engines?

No. Screaming Frog audits the on-page signals that make a page citable. To measure actual citations you need a separate AI visibility tracker or manual prompt testing across ChatGPT, Perplexity, Gemini, and Google AI Overviews. Use Screaming Frog to fix the structural issues and the tracker to confirm impact.

Q: Do I need the paid licence?

The free version crawls a limited number of URLs and disables custom extraction, scheduling, and JavaScript rendering. For any real GEO audit you will need a paid licence. Check the Screaming Frog pricing page for the current annual cost; it is generally cheaper than a single competing SaaS subscription for the same depth of crawl.

Q: How does this differ from a Lighthouse or PageSpeed audit?

Lighthouse audits a single URL for performance, accessibility, SEO basics, and best practices. Screaming Frog audits the entire site for structural and link signals. For GEO you want both: Lighthouse on representative templates and Screaming Frog across the inventory.

Q: Which custom extractions matter most for AI Overviews specifically?

Prioritise AI summary blocks, FAQ blocks, JSON-LD @type coverage, and hub-link presence. Google has stated there are no AI-specific optimisation requirements, but its own documentation still emphasises structured content and clear answer blocks for AI features in Search.

Sources

Screaming Frog. "Generate JSON-LD Schema at Scale With JavaScript Snippets." February 2025. https://www.screamingfrog.co.uk/blog/generate-json-ld-schema-at-scale/

Screaming Frog. "Web Scraping & Custom Extraction." https://www.screamingfrog.co.uk/seo-spider/tutorials/web-scraping/

TruPerformance. "Generating JSON-LD Schema at Scale with Screaming Frog's JavaScript Rendering." February 2025. https://www.truperformance.us/resources/blogs/generating-json-ld-schema/

Google Search Central. "AI Features and Your Website." https://developers.google.com/search/docs/appearance/ai-features

iPullRank. "The Measurement Chasm: Tracking GEO Performance." 2025. https://ipullrank.com/ai-search-manual/measurement-geo

Screaming Frog for GEO Auditing

Why Screaming Frog still matters for GEO

Set up the crawl

Core GEO checks (no custom code required)

Heading hierarchy

Metadata

Internal linking and orphans

Structured data

Custom extractions for GEO signals

1. Detect AI summary blocks

2. Detect FAQ blocks

3. Extract @type values from JSON-LD

4. Detect hub links

5. Generate or normalise JSON-LD at scale

Scoring framework

Audit workflow end-to-end

Common mistakes

Frequently asked questions

Q: Can Screaming Frog tell me if my page is being cited by AI engines?

Q: Do I need the paid licence?

Q: How does this differ from a Lighthouse or PageSpeed audit?

Q: Which custom extractions matter most for AI Overviews specifically?

Sources

Related Articles

GEO Audit Checklist: A 50-Point AI Search Readiness Assessment

HTML Semantic Structure for AI Readability

Structured Data for AI Search

GEO & AI Search Insights

Pillar	Weight	Pass criteria
Single H1, sane H2/H3 hierarchy	15%	Default Screaming Frog checks pass
JSON-LD present and valid	20%	At least one relevant @type, no validation errors
AI summary block	15%	CSSPath extraction returns text
FAQ block or FAQPage schema	10%	Either present
Description 120-160 chars	10%	Length in range
Hub link present	10%	Hub regex matches
Inlinks ≥ section median	10%	Not orphaned or thin
No 4xx internal links	10%	None reported