GEO Glossary: Complete Terminology Reference
This glossary defines the canonical vocabulary of Generative Engine Optimization (GEO), Answer Engine Optimization (AEO), and AI search — covering core acronyms, content patterns, technical standards (llms.txt, ai.txt, schema), retrieval and grounding concepts (RAG, embeddings, chunking, hallucination), and measurement metrics (citation rate, share of AI voice, zero-click visibility).
TL;DR. This glossary is organized A-Z and grouped into five practical categories: core acronyms (GEO, AEO, AIO), content patterns (answer block, definition block, TL;DR), technical standards (llms.txt, ai.txt, schema, JSON-LD), AI/retrieval mechanics (RAG, embeddings, hallucination, grounding), and measurement (citation rate, share of AI voice, zero-click visibility). Use it as the canonical vocabulary for everything else on Geodocs.
How to use this glossary
Terms link out to deeper Geodocs articles where available. Where a term is widely defined across the industry, the definition follows the consensus from primary sources — cloud vendor documentation (Google Cloud, IBM, AWS, Pinecone), peer-reviewed papers, and the original GEO research from Princeton, Georgia Tech, Allen Institute for AI, and IIT Delhi (arXiv 2311.09735, ACM KDD 2024).
A
AEO (Answer Engine Optimization). The practice of formatting and structuring content so AI systems and search features can extract a single, direct answer to a user query. AEO predates GEO and overlaps significantly with featured-snippet optimization. See What Is AEO?.
AI Agent. A software system that uses an LLM to plan, call tools, and complete multi-step tasks autonomously. Documenting content for AI agents is a distinct GEO concern from documenting for human-facing AI search.
AI Citation. A reference to a source inside an AI-generated answer. Citations may be linked, named without a link, or implied through paraphrase. Citation quality (linked + accurate) matters as much as citation count.
AI Crawler. A bot operated by an AI vendor that fetches web content for indexing or grounding. Common identifiers include GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, Claude-SearchBot, PerplexityBot, and Google-Extended.
AI Crawl Signals. Technical indicators AI systems use to discover and index content, including XML sitemaps, llms.txt, structured data, and HTTP cache headers.
AI Overviews. Google's AI-generated answer summaries displayed at the top of search results. AI Overviews are populated by Gemini and grounded by classical Google web index results.
AI Visibility. A program-level metric describing how often and how prominently a brand appears across AI answers and AI search surfaces.
AIO (AI Optimization). An umbrella term sometimes used for optimizing across all AI surfaces — chat assistants, AI Overviews, voice agents, and autonomous agents. Less precise than GEO or AEO.
ai.txt. A machine-readable file declaring a site's AI access policies, attribution preferences, and usage terms. Companion to robots.txt, with AI-specific intent.
Answer Block. A small, self-contained content unit (typically 30-120 words) engineered to be extractable by AI systems as a complete answer.
Answer Extraction. The process by which AI systems identify and pull a specific answer span out of a longer document.
Authoritativeness. A quality signal capturing whether the source is recognized as a credible authority on the topic. Part of E-E-A-T.
B
Backlink. An inbound hyperlink from another domain. Still relevant for both SEO and GEO because backlinks remain a strong proxy for authority during AI source selection.
Branded Search. Queries that include a brand name. Branded search lift is a leading indicator that AI exposure is producing measurable demand.
C
Canonical URL. The single preferred URL for a piece of content, declared with . Canonicalization prevents duplicate content from competing for the same AI citation slot.
Chunking. Splitting a document into smaller passages for retrieval. Chunking strategies (fixed, sentence-based, semantic, sliding window, hierarchical) materially affect what an AI system retrieves; recent research shows hierarchical and semantic strategies outperform fixed-size on multi-question datasets (arXiv 2507.09935).
Citation Frequency. The count of times a brand or URL is cited across a defined prompt set, time window, and AI platform set.
Citation Rate. Cited responses divided by total tested responses for a fixed prompt set. A normalized companion to Citation Frequency.
Citation Quality. A 0-5 score per citation describing whether the brand is mentioned, linked, accurately summarized, or quoted. See AI Visibility Measurement.
Citation Readiness. A draft → reviewed → verified content lifecycle field declaring how confident the publisher is that an AI system can quote the page accurately.
ClaudeBot. Anthropic's web crawler. Respect its user-agent in robots.txt to control retrieval and training behaviour separately.
Content Cluster. A group of related articles organised around a central pillar topic to build topical authority.
Content Feed. RSS, Atom, or JSON feeds that notify AI systems of new and updated content.
Context Window. The maximum number of tokens an LLM can process in a single request. Influences how much retrieved content can be presented during grounding.
Conversational Search. Multi-turn AI search interactions where users refine queries through dialogue.
D
Definition Block. A content pattern (term + category + function) optimised for AI extraction of definitions.
Direct Answer. A specific, actionable answer presented without requiring the user to click a source.
E
E-E-A-T. Experience, Expertise, Authoritativeness, and Trustworthiness — quality signals used by Google and many AI systems when selecting sources.
Embedding. A dense numerical vector that captures the semantic meaning of a piece of text. Embeddings are the backbone of modern retrieval pipelines and vector search (Pinecone, Google Cloud).
Entity. A uniquely identifiable thing AI systems can recognise (person, organisation, product, concept, place, event). Entity optimisation is a discrete GEO discipline.
F
FAQ Schema (FAQPage). A schema.org type that exposes question-and-answer pairs to search engines and AI systems. A dependable AEO pattern.
Featured Snippet. A direct answer rendered above Google's blue links. Predecessor to AI Overviews and still a strong AEO target.
Focus Keyword. The single primary phrase a page is intended to rank or be cited for.
G
GEO (Generative Engine Optimization). The practice of structuring content so AI systems can understand, retrieve, and cite it in generated answers. Coined by Princeton, Georgia Tech, the Allen Institute for AI, and IIT Delhi in a 2023 paper (arXiv 2311.09735) presented at ACM KDD 2024. See What Is GEO?.
GEO Content Strategy. The planning and creation of content specifically designed for AI search visibility. See GEO Content Strategy.
GEO ROI. The measurable business return from a GEO program, typically calculated through a multi-metric framework that combines traffic, citation, and pipeline value. See GEO ROI Framework.
Google-Extended. Google's controllable user-agent for AI training. Disallowing it does not block AI Overviews from displaying your content.
GPTBot. OpenAI's web crawler used to gather data for training and product features. Distinct from OAI-SearchBot and ChatGPT-User.
Grounding. Conditioning an LLM's response on retrieved authoritative content so the answer cites verifiable sources rather than relying on parametric memory. Closely related to but not identical to fact-checking.
H
Hallucination. A confidently presented but factually wrong or unsupported AI output. RAG and grounding reduce but do not eliminate hallucinations (AWS, K2view, Pinecone).
Hub Page. A high-level landing page that links out to all sub-articles in a topic. Hubs are critical for both SEO topical authority and GEO source selection.
I
Indexing. The process by which a search engine or AI system adds content to its searchable store.
Internal Link. A hyperlink between two pages on the same domain. Internal links carry topical authority signals and help AI systems understand site hierarchy.
J
JSON-LD. JavaScript Object Notation for Linked Data — the preferred structured data format for search engines and AI systems. See JSON-LD for AI Search.
K
Knowledge Graph. A structured representation of entities and relationships. Google's Knowledge Graph and analogous internal AI knowledge graphs influence which entities AI systems treat as canonical.
Knowledge Domain. A frontmatter taxonomy field that places a page inside a discipline (e.g., ai-search-optimization).
L
LLM (Large Language Model). A neural network trained on large text corpora that generates language token-by-token. The generative engine in 'Generative Engine Optimization'.
llm_summary. A short, factual frontmatter summary (≤2 sentences) intended for direct quotation by AI systems.
llms.txt. A text file at a website's root that provides AI systems with a structured guide to site content. See llms.txt Reference.
llms-full.txt. An extended version of llms.txt containing complete content for AI consumption.
M
Markdown. A lightweight markup format favoured by AI systems for parsing and quoting structured content.
Meta Description. The 120-160 character HTML description used by both classical search and many AI systems as a fallback summary.
N
Natural Language Processing (NLP). A field of AI focused on understanding and generating human language.
O
OAI-SearchBot. OpenAI's user-agent specifically used for ChatGPT search. Distinct from GPTBot.
P
PerplexityBot. Perplexity's web crawler.
Pillar Page. A long-form authoritative page that anchors a content cluster. Synonym for hub page in many style guides.
Prompt. The input text given to an LLM. In retrieval pipelines, the prompt typically includes both the user query and retrieved context.
Q
Query Fan-Out. The practice of expanding a single user query into multiple sub-queries to improve retrieval recall. Used by Google AI Mode and many enterprise RAG systems.
Question Research. The discipline of identifying high-value, AI-relevant questions for content planning. See Question Research for AEO.
R
RAG (Retrieval-Augmented Generation). An architecture that retrieves relevant content from an external knowledge base and feeds it into an LLM as context, producing more accurate and up-to-date answers (IBM, Google Cloud, Pinecone).
Rich Result. An enhanced search result rendered with structured data — ratings, prices, FAQ accordions, recipe metadata, and similar.
robots.txt. A standard file controlling which crawlers may access which paths. Different rules can target AI crawlers (e.g., GPTBot, Google-Extended).
S
Schema.org. A collaborative vocabulary for structured data markup recognised by major search engines and AI systems.
SERP (Search Engine Results Page). The page of results returned for a query — today often a hybrid of blue links, AI Overviews, featured snippets, and rich results.
Share of AI Voice. The fraction of AI citations a brand earns versus its competitor set on a fixed prompt set. A leading indicator of pipeline impact in B2B.
Sitemap (sitemap.xml). An XML file listing canonical URLs to aid crawler discovery.
Source Selection. The process by which AI systems choose which content sources to cite. See What Is Source Selection in AI Search?.
Structured Data. Machine-readable code (typically JSON-LD) describing content type, properties, and relationships.
T
TL;DR. A short summary block (typically 1-3 sentences) appearing near the top of an article. Highly extractable by AI systems.
Token. The smallest unit an LLM processes — typically a word piece. Context windows are measured in tokens.
Topical Authority. The depth and breadth of content coverage on a specific topic, influencing AI citation likelihood.
Triple. A subject-predicate-object fact ("Geodocs → publishes → GEO documentation") that knowledge graphs use to store relationships. Writing in triples improves AI extractability.
U
User-Agent. The HTTP header that identifies a crawler. Each AI vendor publishes its user-agent strings so site operators can apply targeted policies.
V
Vector Database. A specialised database optimised for similarity search over embeddings (e.g., Pinecone, Weaviate, pgvector).
Verified Content. Content whose claims have been independently checked. The citation_readiness: verified lifecycle stage signals this to AI systems.
W
Wikidata. A free, collaborative knowledge base linking entities by stable IDs (QIDs). Wikidata IDs are a standard sameAs target for entity optimisation.
Z
Zero-Click Visibility. Brand exposure occurring entirely inside an AI-generated answer, with no click to the source. Hard to measure, real in impact (Animalz, BrandViz).
FAQ
Q: What's the difference between SEO, AEO, and GEO?
SEO optimises for ranking inside a list of links. AEO optimises for being the extracted direct answer to a question. GEO optimises for being cited inside a synthesised AI answer. The three are layered: good SEO is generally a prerequisite for strong AEO and GEO performance.
Q: What's the difference between RAG and grounding?
RAG is the architecture (retrieve external documents, feed them to the LLM, then generate). Grounding is the outcome — the model's answer is conditioned on, and ideally cites, those retrieved documents. RAG is one common way to achieve grounding, but not the only way (Google Cloud).
Q: Do I need llms.txt if I already have sitemap.xml?
Yes. sitemap.xml lists URLs for classical search crawlers; llms.txt provides a curated, descriptive index of your content for AI systems. They serve different audiences and complement each other.
Q: How is citation rate different from share of AI voice?
Citation rate is your absolute rate of being cited across a prompt set. Share of AI voice is your relative rate compared to a defined competitor set. Track both — absolute rate shows progress, relative share shows competitive position.
Q: Which terms should a beginner learn first?
Start with SEO, AEO, GEO, E-E-A-T, schema.org, llms.txt, RAG, hallucination, and zero-click visibility. With those nine terms, you can read most modern AI-search documentation comfortably.
Related Articles
What Is AEO? Complete Guide to Answer Engine Optimization
AEO (Answer Engine Optimization) is the practice of structuring content so AI systems and answer engines can extract it as a direct, attributed answer.
GEO vs SEO
GEO optimizes for inclusion and citation in AI-generated answers; SEO optimizes for ranking on traditional SERPs. Both are needed in 2025-2026.
What Is GEO? Generative Engine Optimization Defined
GEO (Generative Engine Optimization) is the practice of structuring content so AI search engines retrieve, understand, synthesize, and cite it in generated answers.