Geodocs.dev

VideoObject Schema for AI Search

ShareLinkedIn

Open this article in your favorite AI assistant for deeper analysis, summaries, or follow-up questions.

VideoObject schema is the schema.org type that describes a video to search engines and AI answer engines. Implementing it with the four required fields (name, description, thumbnailUrl, uploadDate) plus the high-leverage recommended fields (contentUrl, embedUrl, duration, transcript, hasPart with Clip) is the single highest-ROI step for getting your video content cited by AI Overviews, Gemini, ChatGPT, and Perplexity.

TL;DR

Mark every embedded or hosted video on your site with VideoObject JSON-LD. Required fields are name, description, thumbnailUrl, and uploadDate. Add contentUrl or embedUrl, duration in ISO 8601, and a transcript field (or link) for AI-citation lift. For long-form videos, use hasPart with Clip to expose key moments, or potentialAction with SeekToAction if you self-host and want Google to discover key moments automatically. YouTube videos can use timestamps in the description as an alternative to Clip.

AI search engines treat video as a multimodal data stream. They process visual frames, audio, and on-page text in parallel. Without VideoObject schema, an AI engine has to infer what a video is about from the surrounding HTML — a noisy process that often results in the video being indexed but not cited. With VideoObject, you hand the engine a structured description, a thumbnail it can display, a duration it can announce, and (most importantly) a transcript it can quote.

Google Search Central states this directly: "While Google tries to automatically understand details about your video, you can influence the information that's shown in video results, such as the description, thumbnail URL, upload date, and duration, by marking up your video with VideoObject." That same markup is consumed by AI Overviews, by Gemini's video panel, and by Perplexity's video citation surface.

Required fields

Google Search Central and the schema.org VideoObject reference both list four required properties for a valid VideoObject:

  • name — the video title.
  • description — a short description (recommended 50-250 characters; do not duplicate the title verbatim).
  • thumbnailUrl — one or more thumbnail image URLs. Google recommends multiple sizes; minimum 60×30 pixels but realistically you want at least one HD thumbnail (1280×720 or larger).
  • uploadDate — ISO 8601 datetime (2025-04-28 or 2025-04-28T10:30:00Z).

Missing any of these and the schema will fail Google's Rich Results Test, which means it will not enable enhanced video features.

The recommended fields are not technically required, but they are the difference between a video being indexed and a video being cited.

  • contentUrl — the direct URL to the video file (for self-hosted) or the YouTube watch URL.
  • embedUrl — the URL of an embeddable player (e.g., https://www.youtube.com/embed/VIDEO_ID).
  • duration — ISO 8601 duration (PT2M30S for 2 minutes 30 seconds). Required for the video to surface duration in answer cards.
  • transcript — the full transcript as plain text or a URL to a transcript page. This is the single highest-leverage field for AI citation; an answer engine can quote a transcript span verbatim.
  • hasPart — an array of Clip objects describing key moments (see below).
  • interactionStatistic — view counts, like counts, comments. Surfaces engagement signals to AI engines as authority indicators.
  • publisher — the Organization that published the video.
  • inLanguage — BCP 47 language code (en, en-US, vi).

Basic VideoObject example

The minimum-viable schema for a self-hosted video:

{
  "@context": "https://schema.org",
  "@type": "VideoObject",
  "name": "How to implement llms.txt in 5 minutes",
  "description": "Walkthrough of creating an llms.txt file for an AI-search-optimized site, including required fields and validation.",
  "thumbnailUrl": [
    "https://example.com/thumbs/llms-txt-1280.jpg",
    "https://example.com/thumbs/llms-txt-1920.jpg"
  ],
  "uploadDate": "2026-05-02",
  "duration": "PT5M30S",
  "contentUrl": "https://example.com/videos/llms-txt-tutorial.mp4",
  "embedUrl": "https://example.com/embed/llms-txt-tutorial",
  "transcript": "In this video we'll walk through creating an llms.txt file step by step. First, you'll create a markdown file at the root of your site...",
  "publisher": {
    "@type": "Organization",
    "name": "Geodocs",
    "logo": {
      "@type": "ImageObject",
      "url": "https://geodocs.dev/logo.png"
    }
  }
}

Clip structured data — explicit key moments

When you know the exact start and end times of important segments, use Clip inside hasPart. Each Clip has name, startOffset, endOffset, and a url that deep-links to the moment.

{
  "@context": "https://schema.org",
  "@type": "VideoObject",
  "name": "Complete VideoObject Schema Tutorial",
  "description": "Full walkthrough of VideoObject including Clip and SeekToAction patterns.",
  "thumbnailUrl": "https://example.com/thumbs/videoobject.jpg",
  "uploadDate": "2026-05-02",
  "duration": "PT15M00S",
  "contentUrl": "https://example.com/videos/videoobject-tutorial.mp4",
  "hasPart": [
    {
      "@type": "Clip",
      "name": "Required fields",
      "startOffset": 30,
      "endOffset": 180,
      "url": "https://example.com/videos/videoobject-tutorial.mp4#t=30"
    },
    {
      "@type": "Clip",
      "name": "SeekToAction pattern",
      "startOffset": 540,
      "endOffset": 720,
      "url": "https://example.com/videos/videoobject-tutorial.mp4#t=540"
    }
  ]
}

Google Search Central confirms Clip is supported in all languages where Google Search is available, making it the safer cross-language choice over SeekToAction.

SeekToAction — automatic key moments for self-hosted video

If your video URLs already encode timestamps (e.g., ?t=120 or #t=120s), SeekToAction lets Google discover key moments automatically rather than requiring you to enumerate every Clip.

{
  "@context": "https://schema.org",
  "@type": "VideoObject",
  "name": "Complete VideoObject Schema Tutorial",
  "description": "Full walkthrough of VideoObject including Clip and SeekToAction patterns.",
  "thumbnailUrl": "https://example.com/thumbs/videoobject.jpg",
  "uploadDate": "2026-05-02",
  "duration": "PT15M00S",
  "contentUrl": "https://example.com/videos/videoobject-tutorial.mp4",
  "potentialAction": {
    "@type": "SeekToAction",
    "target": "https://example.com/videos/videoobject-tutorial.mp4?t={seek_to_second_number}",
    "startOffset-input": "required name=seek_to_second_number"
  }
}

Google Search Central documents SeekToAction as supported in 12 languages: English, Spanish, Portuguese, Italian, Chinese, French, Japanese, German, Turkish, Korean, Dutch, and Russian. If your audience is outside this list, use Clip instead.

Clip vs SeekToAction — which to use

  • Use Clip when your video is on YouTube (use the YouTube description timestamp pattern instead of schema), or when you want explicit per-segment titles, or when you serve languages outside the SeekToAction-supported set.
  • Use SeekToAction when you self-host the video, your URL pattern already supports timestamp deep-links, and you want Google to discover key moments automatically without listing each one.
  • Do not use both for the same video; pick one.

YouTube videos — timestamps in the description

For YouTube-hosted videos, Google Search Central recommends marking timestamps directly in the YouTube video description rather than using Clip schema on your embed page. The rules:

  • Format each timestamp as [hour]:[minute]:[second] (omit hour if zero).
  • Place each timestamp on a new line.
  • Each line includes a label of at least one word.
  • List timestamps in chronological order.
  • The label should be on the same line as the timestamp.

This is the canonical pattern for YouTube Chapters and is read by both YouTube's player and Google Search.

Transcript embedding strategies

The transcript field is the highest-leverage addition for AI citation. An AI engine can quote a transcript span verbatim, with the video as the source citation. Two strategies:

  • Inline transcript — the full transcript as a string value of the transcript property in the JSON-LD. Best for short videos (<10 minutes) and when you want the transcript embedded in the schema itself.
  • Separate transcript URL — the transcript property points to a dedicated /transcripts/ page. Best for long videos, multilingual transcripts, or when you want the transcript to be its own indexable URL.

In both cases, also render the transcript visibly on the page (collapsed by default is fine). Visible transcript text is processed by retrievers regardless of the schema; the schema's role is to make the relationship explicit.

How AI engines surface video citations

Observed patterns, based on practitioner reporting and Google's own video search documentation:

  • Google AI Overviews — surfaces video citations when the video has a transcript, schema-marked key moments (Clip or SeekToAction), and the surrounding page provides context. Most likely to embed video thumbnail + deep-link into a key moment.
  • Gemini — video panel surfaces VideoObject-marked content first; transcript content is treated as quotable text. Gemini shows video duration prominently when present.
  • ChatGPT Search — cites videos primarily via their transcript content; treats the video page as a text source. Without a transcript, citation rates drop noticeably.
  • Perplexity — surfaces video sources less frequently than text sources, but when it does, it preferentially cites pages with both a transcript and a clear publisher entity.

The consistent pattern across all four: a video without a transcript is a video without an AI citation. Schema markup is necessary but not sufficient — the transcript is what enables the actual quote.

Validation rules and Rich Results Test gotchas

Before shipping, validate at https://search.google.com/test/rich-results. Common failures:

  • Missing uploadDate — the most common failure; treat as a hard requirement.
  • Thumbnail too small — below the minimum size is an error in Rich Results Test; below 1280×720 is a soft warning that often correlates with the video not surfacing.
  • duration not in ISO 8601 — "5:30" is invalid; use "PT5M30S".
  • contentUrl not crawlable — if Google can't fetch the URL (gated by login, blocked by robots.txt, returning 403), the schema is treated as orphaned.
  • embedUrl pointing to your own page rather than the embed iframe — a common confusion; embedUrl should be the player's iframe src, not the page that contains the iframe.
  • Clip without url — each Clip must have a deep-link URL to the moment.
  • YouTube content with both Clip schema and description timestamps — redundant; pick one.

Common mistakes

The failure modes that quietly kill VideoObject AI citation:

  • No transcript — the single biggest cause of low AI citation rates. Even an auto-generated transcript is better than none.
  • Schema on a page where the video is not actually embedded — Google requires the marked video to be visibly present on the page that emits the schema.
  • Generic descriptions — "Watch this video to learn more" provides nothing extractable. Treat the description as a 2-sentence answer to what the video is about.
  • Stale uploadDate — keep it accurate; never set it to today on a year-old video.
  • Marking the same video twice — e.g., once as VideoObject and once as a Movie or Article. Pick the most specific applicable type.

FAQ

Q: Do I need both contentUrl and embedUrl?

You need at least one. contentUrl is the direct file URL (best for self-hosted MP4s); embedUrl is the iframe player URL (best for YouTube/Vimeo embeds). If both are valid for your video, include both.

Q: Can I use VideoObject for a YouTube embed on my page?

Yes. Mark the page with VideoObject using the YouTube watch URL as contentUrl, the YouTube embed URL as embedUrl, and either Clip schema or YouTube-description timestamps for key moments. Google indexes the schema on your page even when the video itself is hosted on YouTube.

Q: Do auto-generated transcripts count?

For schema validation, yes — any non-empty transcript string is accepted. For AI citation quality, an edited transcript outperforms an auto-generated one because punctuation, speaker labels, and proper nouns are correctly cased. Treat auto-transcripts as a starting point, not a finished artifact.

Q: What's the minimum thumbnail resolution for AI search visibility?

Google's hard minimum is 60×30 pixels, but practical surfacing in AI Overviews and Gemini's video panel correlates with at least 1280×720. Provide multiple thumbnailUrl values at different resolutions when feasible.

Q: Does video schema help if my video isn't on the same domain (e.g., YouTube channel)?

It helps the page that embeds the video, not the video itself. If the video lives on YouTube and is embedded on your page, your VideoObject schema on the embed page makes that page a citable source. The YouTube watch URL has its own indexing path independently.

Related Articles

guide

Video Transcript Citation Optimization for AI Search

Optimize YouTube and video transcripts for AI citations with chapter timestamps, speakable schema, and semantic chunking to lift video citation share.

guide

How to Create llms.txt: Step-by-Step Tutorial for AI Search

Step-by-step tutorial for creating, deploying, and validating an llms.txt file so AI systems and LLMs can discover your site's most important content.

guide

Structured Data for AI Search

How to implement structured data (JSON-LD / Schema.org) to improve AI search visibility. Covers TechArticle, FAQPage, HowTo, and entity definitions.

Stay Updated

GEO & AI Search Insights

New articles, framework updates, and industry analysis. No spam, unsubscribe anytime.