AI Search Optimization for Glossary Pages: A Specification
An AI-optimized glossary page is built around self-contained, individually anchored term blocks. Each block ships a one-sentence definition, a short elaboration, and Schema.org DefinedTerm markup nested inside a DefinedTermSet. The whole set is reachable by stable per-term URLs so ChatGPT, Perplexity, Claude, and Gemini can extract, link, and cite each definition independently.
TL;DR
- Treat every term as its own micro-page. AI engines extract passages, not pages.
- Each entry: H2 = term, then a one-sentence definition, then a 60-80-word elaboration. Anchor id matches the slug.
- Mark up with Schema.org DefinedTerm, nested inside a single DefinedTermSet for the page.
- Provide a stable per-term anchor URL (/glossary/#term-slug or /glossary/term-slug) and link to it from supporting articles.
- Audit the page against the conformance checklist at the end before shipping.
Scope
This specification defines the structure, content, markup, and linking conventions for glossary pages on a Geodocs-style site that is optimized for citation by AI search engines. It is intentionally narrower than a generic style guide: it specifies what is required, what is recommended, and what is forbidden for AI extractability.
It does not cover:
- Long-form glossary entries that warrant their own dedicated page (use a definition-style page instead).
- Multilingual glossaries; translation handling is a separate spec.
- Visual or interaction design, beyond what is required for extractability.
Terminology
Keywords use RFC 2119 conventions. MUST, SHOULD, and MAY carry their normative meanings.
- Glossary page: a single HTML document containing two or more term entries.
- Term entry: a self-contained block defining one term.
- Anchor URL: a per-term URL formed by combining the glossary URL with the term's anchor id.
- DefinedTerm / DefinedTermSet: Schema.org types representing one term and the set of all terms on the page, respectively.
Page-level requirements
URL
- The glossary page MUST live under a stable path. Recommended: /glossary/ or /{section}/glossary/.
- The URL MUST NOT include query parameters that affect content rendering (e.g., do not paginate by query string).
- If the glossary has more than 60 entries, the page SHOULD be split into per-letter or per-cluster pages with consistent URL design (e.g., /glossary/a/, /glossary/geo/).
Title and meta
MUST clearly identify the page as a glossary (for example, "Geodocs Glossary — AI Search Terms"). - Meta description MUST be 120-160 characters and state the scope of the glossary.
- Open Graph and Twitter card data SHOULD be present for share previews.
Top-of-page content
- The page MUST open with an H1 matching the page title.
- The H1 MUST be followed by a 1-3 sentence introduction stating: what the glossary covers, who it is for, and how often it is updated.
- The page MUST contain a navigable index (alphabetical or thematic) when it has more than 12 entries.
- The page MAY include a one-paragraph "How to cite this glossary" block near the bottom for journalistic and academic users.
Term entry requirements
Every term entry MUST follow the same shape so AI engines can rely on it.
Structure
<section id="defined-term-slug" class="defined-term">
<h2>Term name</h2>
<p class="definition">One-sentence definition.</p>
<p class="elaboration">60-80 words of context, scope, and disambiguation.</p>
<ul class="see-also">
<li><a href="/glossary/#related-term-slug">Related term</a></li>
</ul>
</section>Anchor and slug
- Each entry MUST have a stable id matching its slug.
- The slug MUST be lowercase, hyphenated, ASCII, and stable across versions of the page.
- The slug MUST NOT be reused across different terms even after a term is deprecated; reuse breaks deep links and citations.
- Deprecated terms SHOULD keep their anchor and add a brief note plus a link to the canonical replacement.
Term name
- The H2 text MUST equal the canonical term as users would search it.
- If the term has an acronym, both forms SHOULD appear in the H2 (e.g., "Generative Engine Optimization (GEO)").
- Aliases MUST be encoded as alternateName in the JSON-LD, not duplicated as separate H2 blocks.
Definition
- The first sentence MUST be a complete, standalone definition of the form "X is a Y that Z".
- The first sentence MUST be 18-35 words. This is the length most consistently quoted by ChatGPT and Perplexity in our citation logs.
- The first sentence MUST NOT open with a hedge ("Generally…", "In some cases…") or a self-reference ("In this glossary…").
- The first sentence MUST NOT contain inline citations, footnotes, or markdown links.
Elaboration
- The elaboration SHOULD be 60-80 words. Shorter blocks lose contextual cues; longer blocks tend to be truncated mid-passage by AI extraction.
- The elaboration SHOULD address: who uses the term, where it differs from a related term, and one canonical example.
- The elaboration MAY include up to two outbound links to related glossary entries or full articles.
- The elaboration MUST NOT include marketing copy or product CTAs.
See-also block
- Each entry MUST link to at least two related terms when they exist on the page.
- Each link MUST point to the per-term anchor URL, not the bare glossary page.
Schema markup
Glossary pages MUST ship Schema.org structured data. JSON-LD is the preferred serialization for AI search use cases (see JSON-LD vs Microdata vs RDFa for AI search).
Required types
- The page MUST include exactly one DefinedTermSet.
- Each term entry MUST be a DefinedTerm referenced from the set via hasDefinedTerm.
- Each DefinedTerm MUST also reference the set via inDefinedTermSet.
Reference template
{
"@context": "https://schema.org",
"@graph": [
{
"@type": "DefinedTermSet",
"@id": "https://geodocs.dev/glossary/#set",
"name": "Geodocs Glossary",
"about": "AI search optimization",
"inLanguage": "en",
"hasDefinedTerm": [
{ "@id": "https://geodocs.dev/glossary/#geo" },
{ "@id": "https://geodocs.dev/glossary/#aeo" }
]
},
{
"@type": "DefinedTerm",
"@id": "https://geodocs.dev/glossary/#geo",
"name": "Generative Engine Optimization",
"alternateName": ["GEO"],
"termCode": "geo",
"description": "Generative Engine Optimization is the practice of preparing content so that generative AI engines such as ChatGPT, Perplexity, Claude, and Gemini cite it as a source in their answers.",
"url": "https://geodocs.dev/glossary/#geo",
"inDefinedTermSet": "https://geodocs.dev/glossary/#set"
}
]
}Markup rules
- The name of each DefinedTerm MUST match the H2 text exactly (excluding acronym parentheticals, which SHOULD go in alternateName).
- The description MUST match the first sentence of the visible definition. Marking up content not visible to users violates Google's structured data guidelines.
- The url MUST be the absolute per-term anchor URL.
- Every DefinedTerm MUST include inDefinedTermSet and the set MUST include the corresponding entry in hasDefinedTerm. Bidirectional references prevent partial parses by some validators.
- The inLanguage field MUST be set on the set (and MAY be set per term for mixed-language glossaries).
- A glossary page MUST NOT wrap its terms in unrelated types such as Article or BlogPosting. Use a single DefinedTermSet plus the page's site-wide WebPage / Organization graph.
Linking conventions
Glossaries earn citation share when other pages link to them. Cross-linking is part of the spec.
From article to glossary
- The first occurrence of a defined term in any article SHOULD link to its per-term anchor URL.
- Subsequent occurrences in the same article MAY be plain text.
- Anchor text MUST equal the term as defined; do not use generic anchors ("click here", "this concept").
Within the glossary
- Each entry MUST link to at least two related terms.
- The page MAY include a small "Browse by topic" navigation that groups terms into clusters.
- The page MUST NOT wrap individual term blocks in navigation
From glossary to articles
- Each entry MAY link to one canonical hub article on that topic.
- The link SHOULD sit in the elaboration paragraph, not in the see-also list, so AI engines treat it as in-context citation rather than navigation.
Updates and freshness
- The page dateModified MUST update whenever any term entry changes substantively.
- New terms MUST be added at the bottom of their alphabet/cluster section, not interleaved retroactively, to preserve cached anchor positions.
- The page SHOULD display its dateModified near the top so AI engines exposing freshness signals can surface them.
Accessibility
- Each
for a term entry MUST have an accessible name (the H2). - Anchor links to per-term URLs MUST be keyboard-focusable and visible on focus.
- Decorative emoji or icons in term names MUST NOT carry semantic meaning.
Conformance checklist
A glossary page conforms with this specification when all of the following are true.
Page-level:
- [ ] The page lives at a stable URL with no rendering query parameters.
- [ ] H1 matches the title and is followed by a 1-3 sentence intro.
- [ ] An index is present when more than 12 entries.
Per term:
- [ ] H2 contains the canonical term.
- [ ] First sentence is a complete "X is a Y that Z" definition, 18-35 words.
- [ ] Elaboration is 60-80 words.
- [ ] Stable, lowercase, hyphenated, ASCII slug used as id.
- [ ] At least two see-also links to other entries.
Markup:
- [ ] One DefinedTermSet on the page, referenced by every DefinedTerm via inDefinedTermSet.
- [ ] DefinedTermSet hasDefinedTerm lists every term entry.
- [ ] Each DefinedTerm description matches the visible first sentence.
- [ ] Each DefinedTerm url is the absolute per-term anchor URL.
- [ ] No marked-up content that is not visible to users.
Linking:
- [ ] First mention of every term in articles links to its per-term anchor URL with the term as anchor text.
- [ ] Each entry has at least two see-also links.
Freshness:
- [ ] dateModified updates on substantive content edits.
- [ ] dateModified is visible on the page.
A page failing any MUST clause does not conform.
Common mistakes
- Wrapping each term in Article or BlogPosting schema rather than DefinedTerm. AI extractors often skip over articles whose body is a single paragraph.
- Reusing slugs across renamed terms. Citations rot silently.
- Putting the formal definition in the second or third sentence. Most AI extractors take only the first sentence as the definition.
- Including marketing CTAs in the elaboration. They survive extraction and dilute trust signals.
- Building a single 200-term page with no anchor index. Long pages with no extraction-friendly anchors lose citation share to competitors with smaller, more navigable glossaries.
- Marking up alternateName as separate DefinedTerm entries. They should be aliases, not standalone entries.
FAQ
Q: Does this spec only apply to AI search, or to traditional SEO too?
It is compatible with both. DefinedTerm and DefinedTermSet are standard Schema.org types and respected by Google's traditional indexing. The constraints around definition length, anchor stability, and bidirectional schema references are tighter than what traditional SEO requires; meeting them does not break it.
Q: How long should a glossary page be?
Up to about 60 entries on a single page works well. Above that, split by letter or topic cluster. Massive single-page glossaries strain extraction and produce inconsistent anchor citations across AI engines.
Q: Should each term get its own page instead?
For long-form, deeply discussed concepts, yes — build a dedicated definition-style page and link to it from the glossary entry. Keep the glossary entry short and let the dedicated page carry the depth. The glossary entry remains the canonical anchor for short citations.
Q: Do I need both inDefinedTermSet on the term and hasDefinedTerm on the set?
Yes. Bidirectional references are recommended. Several validators and downstream consumers parse only one direction; covering both removes a class of silent failures.
Q: How do I migrate an existing glossary to this spec without breaking citations?
Keep all existing slugs unchanged. Add the schema. Tighten definitions and elaborations to the length targets one term at a time. Update internal article links to point to per-term anchor URLs. Run the conformance checklist before each release.
Q: What about FAQPage schema for glossary pages?
Do not use FAQPage for glossaries. FAQPage implies question-and-answer phrasing; DefinedTerm implies term-and-definition phrasing. Mixing them confuses AI extractors and risks Google manual actions for inappropriate structured data.
Related resources
- JSON-LD vs Microdata vs RDFa for AI search
- Structured data for AI search
- AI citation tracking with server log analysis
- GEO sprint retrospective framework
- What is GEO — hub for the discipline
Related Articles
AEO Content Checklist
A 30-point AEO content checklist across five pillars (Answerability, Authority, Freshness, Structure, Entity Clarity) to make pages reliably AI-citable in 2026.
GEO Sprint Retrospective Framework: Continuous Improvement for Citation Teams
GEO sprint retrospective framework: a 60-minute ritual for citation teams to review wins, regressions, and experiments after each two-week GEO sprint.
AI Citation Tracking with Server Log Analysis: A Technical Guide
AI citation tracking with server log analysis: identify GPTBot, PerplexityBot, ClaudeBot hits, link them to citations, and measure crawl-to-cite latency.