Geodocs.dev

Video Sitemap Specification for AI Search Citations

ShareLinkedIn

Open this article in your favorite AI assistant for deeper analysis, summaries, or follow-up questions.

A video sitemap is an XML file using Google's video:video namespace to declare video URLs and metadata (title, thumbnail, duration, content or player location). It is the discovery layer multimodal AI search engines use before pairing the video with transcripts and VideoObject JSON-LD for citation.

TL;DR

List every important video in an XML sitemap using the video:video extension. Required tags: video:thumbnail_loc, video:title, video:description, and one of video:content_loc or video:player_loc. Add duration, publication_date, tag, family_friendly, and live for richer signals. Pair each video page with on-page transcripts and VideoObject JSON-LD for AI citation eligibility.

Definition

A video sitemap is a sitemap using Google's video:video namespace to expose video metadata that base sitemaps cannot describe. Tags live in the namespace http://www.google.com/schemas/sitemap-video/1.1 (Google Search Central, 2024). Each belongs to one parent representing the page that hosts the video.

AI Mode and AI Overviews increasingly synthesize video citations alongside text results, and Google AI Mode supports image-based queries that may be answered with video clips (Google, 2025). For AI engines to cite a video, they must first discover it (sitemap), understand it (thumbnail + title + description, plus transcript), and trust it (publication date, license, page authority).

Three concrete impacts:

  1. Discovery for embedded or hosted video. Engines often cannot extract a usable video reference from JavaScript embeds without a sitemap entry.
  2. Snippet candidates. description and tag provide citable text; thumbnail_loc provides the visual snippet.
  3. Constraint-aware results. restriction, platform, requires_subscription, and live let engines avoid showing your video where it cannot play.
TagRequiredPurpose
YesNamespace declaration.
YesPage URL hosting the video.
YesContainer; up to 1 per video, multiple per page allowed.
YesImage URL for the thumbnail.
YesPlain text or CDATA; HTML-escaped.
YesUp to 2,048 chars.
One of these requiredDirect video file URL (mp4, mov, etc.).
One of these requiredEmbed/player URL (e.g., YouTube embed).
RecommendedSeconds (1-28,800).
RecommendedISO-8601 datetime.
OptionalStop showing in results after this date.
Optionalyes or no.
OptionalUp to 32 tags per video.
OptionalUp to 256 chars.
OptionalISO 3166 country codes.
Optionalweb, mobile, tv.
Optionalyes or no.
Optionalyes or no.
Optional0.0-5.0.
OptionalInteger view count.

File constraints (Google guidance): a video sitemap can contain up to 50,000 entries and must be ≤ 50 MB uncompressed. Use a sitemap index for larger sets. Source files must be accessible to Googlebot — not blocked by robots.txt, login, or streaming-only protocols (Google Search Central, 2024).

How AI engines use the video sitemap

flowchart LR
    A["Crawler reads sitemap.xml"] --> B["Parse video:video entries"]
    B --> C["Fetch thumbnail + page"]
    C --> D["Read on-page transcript
+ VideoObject JSON-LD"]
    D --> E["Index visual + textual
signals together"]
    E --> F["AI answer cites video
with thumbnail + snippet"]

The sitemap is discovery only. Citation quality depends on the on-page artifacts: a high-quality thumbnail, a descriptive title, and — critically — a textual transcript that AI engines can quote. Pair the sitemap with VideoObject JSON-LD that includes a transcript URL for the strongest stack.

Canonical XML example

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
  <url>
    <loc>https://example.com/learn/ai-mode-walkthrough</loc>
    <video:video>
      <video:thumbnail_loc>https://example.com/img/ai-mode-thumb.jpg</video:thumbnail_loc>
      <video:title>How Google AI Mode answers multimodal queries</video:title>
      <video:description>A 6-minute walkthrough showing how AI Mode handles image and text queries, with examples and citation behavior.</video:description>
      <video:content_loc>https://example.com/video/ai-mode-walkthrough.mp4</video:content_loc>
      <video:player_loc>https://example.com/embed/ai-mode-walkthrough</video:player_loc>
      <video:duration>372</video:duration>
      <video:publication_date>2026-04-21T10:00:00+00:00</video:publication_date>
      <video:family_friendly>yes</video:family_friendly>
      <video:tag>AI Mode</video:tag>
      <video:tag>multimodal search</video:tag>
      <video:requires_subscription>no</video:requires_subscription>
      <video:live>no</video:live>
    </video:video>
  </url>
</urlset>

Implementation patterns (5 examples)

1. Self-hosted video on a docs site

Use content_loc with a direct mp4 URL, plus a transcript on the page and VideoObject.transcript JSON-LD. Set family_friendly: yes and requires_subscription: no to maximize eligibility.

2. YouTube embed

Use player_loc pointing to the embed URL. Provide a copy of the transcript on your page (YouTube auto-captions are not sufficient for AI citation grounding because engines tie quotes to the page URL).

3. Vimeo / Wistia private video

If the video requires authentication, set requires_subscription: yes. Engines may still index metadata for context but will not surface the player.

4. Live streams

Set live: yes and update expiration_date after the stream ends. Replace with the on-demand entry post-event.

5. Geographically restricted content

Use US CA GB (or deny) with ISO 3166 country codes. Engines respect the directive when surfacing results.

Common errors and validator quirks

  • Missing content_loc and player_loc — at least one is required.
  • Thumbnail too small — follow Google's thumbnail size guidance; small or low-quality images suppress eligibility.
  • Duration out of range — must be 1-28,800 seconds.
  • Title with raw HTML — wrap in CDATA or escape entities.
  • Sitemap blocked by robots.txt — verify the sitemap URL and the video URLs are crawlable.
  • Conflicting page schema — if VideoObject JSON-LD on the page disagrees with the sitemap (different duration, thumbnail), engines may distrust both.
SignalStrength for AI discoveryNotes
BaselineRequired; sourced via
Video sitemapHighCloses JS-embed gaps
VideoObject JSON-LDHighPairs with transcript URL
On-page transcriptCriticalCitable text for AI answers
Open Graph og:videoMediumUsed for social previews

Common mistakes

  • Submitting a video sitemap inside the standard sitemap.xml file (separate is recommended).
  • Skipping the on-page transcript — AI engines need quotable text.
  • Using YouTube auto-captions only — not as accurate or as durable as a self-hosted transcript.
  • Setting expiration_date too aggressively and orphaning citation paths.
  • Forgetting to update sitemap on video re-encodes that change content_loc.

How to validate and deploy

  1. Generate the video sitemap from your video CMS or manifest.
  2. Validate XML structure and submit via Google Search Console.
  3. Reference it from robots.txt via the Sitemap: directive.
  4. Verify each content_loc / player_loc is reachable to Googlebot and AI crawlers.
  5. Re-generate on every video publish, replace, or removal.

FAQ

Q: Can I include a video sitemap inside my main sitemap.xml?

Google's documentation strongly recommends keeping the video sitemap as a separate file. It is easier to debug and resubmit, and avoids 50 MB / 50,000-entry limits affecting your main sitemap.

Q: Do I need both content_loc and player_loc?

At least one is required. If both are present, content_loc is preferred when accessible because engines can probe the file directly.

Q: Is there a video:transcript tag?

No. Google's video sitemap namespace does not define a transcript tag. Provide the transcript on the hosting page (HTML) and reference it via VideoObject.transcript JSON-LD.

Q: Do AI crawlers like GPTBot read video sitemaps?

Major AI crawlers honor robots.txt and sitemap directives. Listing videos in a sitemap improves the chance they fetch the hosting page and pair it with on-page transcript text for citation.

Q: Should I include YouTube videos hosted off my domain?

Yes, when the video is embedded on a page you control. Use player_loc with the embed URL. The hosting page becomes the citation target.

Q: How often should I regenerate the sitemap?

On every publish, replacement, or metadata change. Sitemaps with stale lastmod or duration drift erode discovery quality.

Related Articles

specification

BreadcrumbList Schema Specification for AI Search Citation Context

BreadcrumbList schema specification: required fields, position ordering, and how AI engines use breadcrumb structured data to disambiguate citations.

specification

Image Sitemap Specification for Multimodal AI Citations

Image sitemap specification for multimodal AI citations: image:image markup, captions, license, geo-location, and signals AI engines extract for visual search.

guide

JavaScript SPA Hydration Patterns for AI Crawlers

JavaScript SPA hydration patterns for AI crawlers: rendering modes, mismatch fixes, and framework-specific strategies for GPTBot, ClaudeBot, PerplexityBot.

Topics
Stay Updated

GEO & AI Search Insights

New articles, framework updates, and industry analysis. No spam, unsubscribe anytime.