Geodocs.dev

RAG chunking strategies compared: fixed, semantic, and hybrid chunking

ShareLinkedIn

Open this article in your favorite AI assistant for deeper analysis, summaries, or follow-up questions.

Fixed-size chunking is fast and simple but can break sentences mid-thought. Semantic chunking splits at meaning boundaries and improves retrieval coherence at the cost of latency and embedding spend. Hybrid chunking layers structural rules with semantic checks and is typically the best default for mixed corpora.

TL;DR: Start with fixed-size chunking (e.g., 256-512 tokens, 10-20% overlap) to get a working baseline. Move to semantic or hybrid chunking when you measure retrieval quality and find your fixed chunks splitting answers across boundaries.

Why chunking matters

In a Retrieval-Augmented Generation (RAG) pipeline, chunking happens during indexing: documents are split into smaller passages, embedded, and stored in a vector database. Retrieval pulls the top-k chunks for a query, and those chunks become the context for the LLM. If chunks are too small, the model lacks context. If they are too large, retrieval becomes coarse and the model wastes tokens. The chunking strategy directly shapes both retrieval precision and answer quality.

The three strategies at a glance

StrategyHow it splitsProsConsTypical chunk size
Fixed-sizeEqual token / character / word countsTrivial to implement; fast; predictable costCan cut sentences; no respect for structure256-512 tokens
SemanticAt sentence / topic boundaries via embeddingsCoherent passages; better recall on dense answersSlower; embedding cost at index timeVariable, ~200-800 tokens
HybridStructural splits, then semantic refinementRobust across heterogeneous documentsMore moving parts; needs tuningVariable, ~256-768 tokens

Fixed-size chunking

Fixed-size chunking divides text into equally sized pieces measured in tokens, characters, or words, often with an overlap (commonly 10-20%) so context isn't lost at boundaries.

Use when:

  • You're building a baseline.
  • Documents are uniform (chat logs, support tickets, transcripts).
  • You need predictable indexing cost and latency.

Watch out for: mid-sentence cuts and split lists. The Weaviate and Pinecone reference posts both note this as the most common quality problem with fixed-size chunking.

Semantic chunking

Semantic chunking embeds short windows (often individual sentences) and creates a new chunk wherever the embedding distance between adjacent sentences exceeds a threshold. Variants include:

  • Percentile-based — split where similarity drops below a percentile of the sentence-pair distribution.
  • Double-pass — first pass clusters sentences; second pass merges small or off-topic clusters.
  • Proposition-based — an LLM rewrites text into atomic propositions, which are then grouped.

Use when:

  • Documents mix narrative and reference material.
  • Answers tend to span multiple sentences and need to stay together.
  • Indexing budget allows the extra embedding calls.

Watch out for: thresholds that produce too many tiny chunks, and high indexing cost at scale.

Hybrid chunking

Hybrid chunking layers two passes:

  1. Structural pass — split on document structure (headings, paragraphs, list boundaries, page breaks).
  2. Semantic pass — within each structural chunk, merge or split based on embedding similarity.

Many teams add a third pass: size guardrails that re-split any chunk above a maximum token count and merge any chunk below a minimum.

Use when:

  • Corpus mixes long docs (PDFs, articles) and short docs (FAQs, knowledge base articles).
  • You can afford a small upfront tuning effort.
  • Retrieval quality matters more than indexing speed.

NVIDIA's evaluation of chunking strategies for RAG found that respecting natural document boundaries (e.g., page-level chunking) was a strong default; hybrid strategies generalize this idea by combining boundaries with semantic checks.

Chunk overlap

Regardless of strategy, overlap copies the last N tokens of one chunk into the start of the next so context isn't lost at the boundary. A 10-20% overlap is a common starting point. Independent retrieval-quality tests (including community benchmarks shared on r/Rag) report meaningful recall lifts from adding modest overlap, especially for dense retrievers.

How to evaluate which strategy wins

Build a small test harness:

  1. Hold-out QA pairs — 50-200 questions with known correct answers from your corpus.
  2. Run each chunking strategy through the same retriever and reranker.
  3. Measure precision@k, recall@k, and end-to-end answer accuracy.
  4. Track indexing time and embedding cost.

The right strategy is the one that maximizes answer accuracy at acceptable cost — not the one with the most sophisticated splitter.

Common mistakes

  • Tuning chunk size without overlap — overlap usually moves the needle more than chunk size.
  • Going semantic too early — a tuned fixed-size baseline often beats a poorly tuned semantic chunker.
  • Ignoring document structure — even simple HTML heading splits beat blind token windows.
  • Skipping the eval harness — without QA pairs, you're tuning blind.

FAQ

Q: What's a good default chunk size?

256-512 tokens with 10-20% overlap. Tune from there based on retrieval evaluation.

Q: Should I always use semantic chunking?

No. Semantic chunking shines on heterogeneous corpora and dense answers, but it costs more to build and is overkill for short, uniform documents like FAQ entries.

Q: How does chunk overlap interact with chunk size?

Larger chunks need less overlap because context is already inside the chunk. Smaller chunks need more overlap to keep cross-boundary meaning. Treat them as a paired tuning knob.

Q: Does the embedding model affect chunking?

Yes. Higher-context embedding models tolerate longer chunks; smaller models perform best on chunks well below their max input length.

Q: Is 'page-level chunking' a fourth strategy?

It can be considered a structural variant of hybrid chunking — split on page or major section boundaries first, then optionally refine. NVIDIA's published evaluation noted it as a strong default for document-style corpora.

Related Articles

guide

404 Page AI Crawler Handling: Avoiding Citation Loss During Migrations

Migration playbook for keeping AI citations during URL changes — hard 404 vs soft 404, 410 Gone, redirect chains, sitemap cleanup, and refetch monitoring.

specification

Accept-Encoding (Brotli, Gzip) for AI Crawlers

Specification for serving Brotli, gzip, and zstd to AI crawlers via Accept-Encoding negotiation: which bots support which codecs, fallback rules, and Vary handling.

specification

Accept-Language and AI Language Detection

Specification for Accept-Language negotiation and html lang attribution that lets AI crawlers detect locale correctly without cross-locale citation leaks.

Stay Updated

GEO & AI Search Insights

New articles, framework updates, and industry analysis. No spam, unsubscribe anytime.