Does Speakable still drive Google Assistant traffic in 2026?

Yes. Speakable remains in beta and is still limited primarily to U.S. English news content per Google's official documentation, but it is the only documented mechanism for opting a section into Google Assistant TTS playback. Publishers in eligible verticals continue to ship the markup because it is low-cost and the only path to that surface.

Should I choose cssSelector or xpath?

Prefer cssSelector in almost all cases. CSS selectors are more familiar to front-end teams, easier to validate in browser tools, and more resilient to template changes when paired with stable, intent-named classes. Use xpath only when you need expressions CSS cannot handle, such as selecting based on text content or positional sibling logic.

Does Speakable affect AI Overview citation likelihood?

Google has not confirmed Speakable as a direct AI Overviews ranking signal. In practice, however, AI extractors prefer pages whose answer-shaped content is clearly delimited, and Speakable provides exactly that delimitation. Marking a clean summary block tends to align with the same content engines already prefer for citation, so the markup is a useful complement to broader answer-engine optimization rather than a standalone lever.

Can I add Speakable to a BlogPosting or NewsArticle?

Yes. Both are subtypes of Article per schema.org, so the speakable property is valid on either. Place the property on the most specific type you already use; do not introduce a parallel Article node solely to host the markup.

How do I validate Speakable markup?

Use Google's Rich Results Test for end-to-end validation against Google's beta requirements, and the Schema.org validator for vocabulary-level checks. Both tools confirm that the JSON-LD parses, the SpeakableSpecification is well formed, and the selector resolves to at least one element in the rendered DOM.

Speakable Schema Specification for AI Voice Search

Speakable is a schema.org property that marks specific sections of an article or web page as best suited for text-to-speech playback by voice assistants and AI answer engines. It is officially supported in beta by Google for Article and WebPage types, with section targeting via cssSelector, xpath, or URL id references.

TL;DR

Speakable schema flags the parts of a page that voice assistants should read aloud. Implement it with JSON-LD on Article or WebPage types, target the right blocks with cssSelector (preferred) or xpath, and keep each speakable block under roughly 30 seconds of spoken audio. Google Assistant remains the largest production consumer (still in beta, U.S. English news content), while AI engines increasingly use the same markup as a hint for which paragraphs to extract as voice or summary citations.

Definition

The speakable property is a schema.org vocabulary term that identifies sections of an Article or WebPage that are particularly suited for audio rendering through text-to-speech (TTS). It is canonically defined on the schema.org property page (https://schema.org/speakable) and operationalized by Google as the Speakable structured data feature, currently in beta. The property accepts a SpeakableSpecification value, which exposes three content-locator strategies: cssSelector, xpath, and id-value URL references that point at fragments within the same document. The property can be repeated, so multiple disjoint regions of a page can be flagged independently.

Speakable does not change ranking in classic search results. Its function is to declare an extraction contract: when a voice or AI engine asks the page "which parts of you are appropriate to read aloud," the markup gives a machine-readable answer instead of forcing the engine to guess from headings or paragraph order.

Why this matters

Voice and AI answer surfaces are still growing share of informational queries. Google Assistant uses Speakable markup to select up to three news articles and read marked sections back to users on smart speakers and Android devices. AI Overviews, Perplexity, and ChatGPT Search do not require Speakable, but their extractors behave more reliably on pages that already mark their answer-shaped content. Marking a section as speakable concentrates extractor attention on the highest-quality summary text instead of on boilerplate or navigation.

For publishers, the practical upside is twofold: a small but real share of voice impressions on Google Assistant, and cleaner snippet selection across AI engines that read the same vocabulary. The upside is asymmetric—Speakable adds no risk to traditional rankings, costs only a few lines of JSON-LD per page, and creates a stable answer-shaped surface that newer engines can reuse without changing their crawler.

How it works

A Speakable specification is a child object on an Article or WebPage JSON-LD entity. Its core fields are:

Field	Type	Required	Purpose
@type	Text	Yes	Always SpeakableSpecification.
cssSelector	Text	One of selector/xpath/url	CSS selector targeting one or more elements within the page DOM.
xpath	Text	One of selector/xpath/url	XPath 1.0 expression targeting elements within the page DOM.
url	URL	One of selector/xpath/url	URL with a fragment (#id) that resolves to an element on the same document.

At least one of cssSelector, xpath, or url is required. The schema.org definition allows the property to be repeated, so a page can declare multiple speakable regions—for example a headline summary plus a key-points list—each with its own selector.

Voice and AI engines that consume Speakable typically perform four steps:

Parse the page's JSON-LD and locate any speakable property on the top-level Article or WebPage node.
Resolve each SpeakableSpecification to one or more DOM nodes using the selector strategy.
Extract the text content of those nodes, normalize whitespace, and trim away inline navigation or interactive elements.
Render the extracted text through TTS (Google Assistant) or hand it to a downstream summarizer (AI answer engines).

Google's beta guidance recommends keeping speakable text concise—under roughly 20 to 30 seconds of spoken audio per section—and pointing selectors at content that stands alone without the surrounding article. Sections that depend on a chart, table, or earlier paragraph for meaning will produce confusing audio output and should not be flagged.

Practical application

A minimal implementation on a news article looks like this when serialized as JSON-LD inside a