- Published on
RAG Chunking Strategies — How You Split Documents Changes Everything
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
How you chunk documents directly determines retrieval quality. Split too small and you lose context; too large and you waste tokens with irrelevant information. More critically, different chunking strategies optimize for different retrieval patterns.
This post covers the full spectrum of chunking approaches used in production RAG systems.
- Fixed-Size Chunking with Overlap
- Recursive Character Splitting
- Semantic Chunking
- Document Structure Chunking
- Sentence-Window Retrieval
- Parent-Child Chunking
- Late Chunking
- Chunk Size Experiments
- Checklist
- Conclusion
Fixed-Size Chunking with Overlap
The simplest approach: split by character count with overlap to preserve context:
interface ChunkingConfig {
chunkSize: number; // characters per chunk
overlapSize: number; // characters to overlap between chunks
}
function fixedSizeChunking(
text: string,
config: ChunkingConfig
): Array<{ id: string; text: string; startIdx: number; endIdx: number }> {
const chunks: Array<{
id: string;
text: string;
startIdx: number;
endIdx: number;
}> = [];
const stride = config.chunkSize - config.overlapSize;
for (let i = 0; i < text.length; i += stride) {
const endIdx = Math.min(i + config.chunkSize, text.length);
const chunkText = text.substring(i, endIdx);
if (chunkText.length > 50) { // Skip tiny fragments
chunks.push({
id: `chunk_${chunks.length}`,
text: chunkText,
startIdx: i,
endIdx: endIdx,
});
}
if (endIdx === text.length) break;
}
return chunks;
}
// Usage: 1024 chars per chunk, 128 chars overlap
const chunks = fixedSizeChunking(largeText, {
chunkSize: 1024,
overlapSize: 128,
});
Pros: Simple, fast, predictable token usage Cons: No understanding of content structure, loses semantic boundaries
Recursive Character Splitting
LangChain's recursive splitter: split on progressively smaller delimiters to preserve structure:
async function recursiveCharacterSplit(
text: string,
targetChunkSize: number = 1024,
targetOverlapSize: number = 100
): Promise<string[]> {
const separators = [
'\n\n', // paragraph breaks
'\n', // line breaks
'. ', // sentence breaks
' ', // words
'', // characters
];
function splitText(
textToSplit: string,
separators: string[]
): string[] {
const goodSplits: string[] = [];
let separatorIndex = separators.length - 1;
// Find the largest separator that splits the text
while (separatorIndex >= 0) {
const separator = separators[separatorIndex];
if (separator === '') {
goodSplits.push(...textToSplit.split(''));
break;
}
if (textToSplit.includes(separator)) {
const splits = textToSplit.split(separator);
let mergedText = '';
const newGoodSplits: string[] = [];
for (const s of splits) {
if (
(mergedText + s).length < targetChunkSize &&
(mergedText + s).length < 5000
) {
mergedText += s + separator;
} else {
if (mergedText) newGoodSplits.push(mergedText.trim());
mergedText = s + separator;
}
}
if (mergedText) newGoodSplits.push(mergedText.trim());
return newGoodSplits;
}
separatorIndex -= 1;
}
return goodSplits.filter(s => s.length > 0);
}
return splitText(text, separators);
}
Pros: Preserves logical structure, respects paragraph/sentence boundaries Cons: Still structural, not semantic
Semantic Chunking
Split based on semantic similarity between sentences:
import { embed } from './embeddings';
async function semanticChunking(
text: string,
similarityThreshold: number = 0.5
): Promise<string[]> {
// Step 1: Split into sentences
const sentences = text.match(/[^.!?]+[.!?]+/g) || [];
if (sentences.length < 2) return sentences;
// Step 2: Embed each sentence
const embeddings = await Promise.all(
sentences.map(s => embed(s.trim()))
);
// Step 3: Compute similarity between consecutive sentences
function cosineSimilarity(a: number[], b: number[]): number {
const dotProduct = a.reduce((sum, x, i) => sum + x * b[i], 0);
const normA = Math.sqrt(a.reduce((sum, x) => sum + x * x, 0));
const normB = Math.sqrt(b.reduce((sum, x) => sum + x * x, 0));
return dotProduct / (normA * normB);
}
// Step 4: Identify chunk boundaries at low-similarity transitions
const chunks: string[] = [];
let currentChunk = sentences[0];
for (let i = 1; i < sentences.length; i++) {
const similarity = cosineSimilarity(embeddings[i - 1], embeddings[i]);
if (similarity < similarityThreshold) {
// Start new chunk at semantic boundary
chunks.push(currentChunk.trim());
currentChunk = sentences[i];
} else {
// Continue building current chunk
currentChunk += ' ' + sentences[i];
}
}
if (currentChunk) chunks.push(currentChunk.trim());
return chunks;
}
Pros: Respects semantic boundaries, improves retrieval relevance Cons: Slow (requires embedding every sentence), less predictable sizes
Document Structure Chunking
Parse document structure (headers, sections) and chunk accordingly:
interface StructuredChunk {
text: string;
metadata: {
heading: string;
section: string;
hierarchy: string[];
pageNumber?: number;
};
}
async function structureAwareChunking(markdown: string): Promise<StructuredChunk[]> {
const lines = markdown.split('\n');
const chunks: StructuredChunk[] = [];
let currentHeading = 'Root';
let currentSection = '';
let currentHierarchy: string[] = [];
let buffer = '';
for (const line of lines) {
// Detect headers
const headerMatch = line.match(/^(#{1,6})\s+(.+)$/);
if (headerMatch) {
// Save previous chunk
if (buffer.trim()) {
chunks.push({
text: buffer.trim(),
metadata: {
heading: currentHeading,
section: currentSection,
hierarchy: [...currentHierarchy],
},
});
buffer = '';
}
// Update hierarchy
const level = headerMatch[1].length;
currentHierarchy = currentHierarchy.slice(0, level - 1);
currentHeading = headerMatch[2];
currentHierarchy.push(currentHeading);
currentSection = currentHierarchy.join(' > ');
} else if (line.trim()) {
buffer += line + '\n';
// Create chunk when buffer reaches ~1000 chars
if (buffer.length > 1000) {
chunks.push({
text: buffer.trim(),
metadata: {
heading: currentHeading,
section: currentSection,
hierarchy: [...currentHierarchy],
},
});
buffer = '';
}
}
}
// Save final chunk
if (buffer.trim()) {
chunks.push({
text: buffer.trim(),
metadata: {
heading: currentHeading,
section: currentSection,
hierarchy: [...currentHierarchy],
},
});
}
return chunks;
}
Pros: Preserves document structure in metadata, enables structure-aware retrieval Cons: Requires parsing, format-dependent
Sentence-Window Retrieval
Embed individual sentences, but retrieve entire paragraphs (maximize context):
interface SentenceWindowChunk {
sentenceId: string;
sentence: string;
sentenceEmbedding: number[];
window: {
before?: string;
after?: string;
};
metadata: Record<string, unknown>;
}
async function sentenceWindowRetrieval(
text: string,
windowSize: number = 2
): Promise<SentenceWindowChunk[]> {
// Step 1: Split into sentences
const sentences = text.match(/[^.!?]+[.!?]+/g) || [];
// Step 2: Create sentence-level chunks with window context
const chunks: SentenceWindowChunk[] = [];
for (let i = 0; i < sentences.length; i++) {
const sentence = sentences[i].trim();
const sentenceEmbedding = await embed(sentence);
// Collect surrounding sentences (window)
const before = sentences.slice(Math.max(0, i - windowSize), i).join(' ');
const after = sentences.slice(i + 1, Math.min(sentences.length, i + 1 + windowSize)).join(' ');
chunks.push({
sentenceId: `sent_${i}`,
sentence,
sentenceEmbedding,
window: {
before: before || undefined,
after: after || undefined,
},
metadata: {
sentenceIndex: i,
docOffset: text.indexOf(sentence),
},
});
}
return chunks;
}
// At retrieval time: search by sentence, return window
async function retrieveWithWindow(
query: string,
chunks: SentenceWindowChunk[],
topK: number = 5
): Promise<Array<{ text: string; metadata: Record<string, unknown> }>> {
const queryEmbedding = await embed(query);
// Compute similarity for each sentence
const similarities = chunks.map(chunk => {
const similarity = cosineSimilarity(queryEmbedding, chunk.sentenceEmbedding);
return { chunk, similarity };
});
// Get top-k sentences
const topSentences = similarities
.sort((a, b) => b.similarity - a.similarity)
.slice(0, topK);
// Return sentences with their windows for context
return topSentences.map(({ chunk }) => ({
text: [
chunk.window.before ? chunk.window.before + ' ' : '',
chunk.sentence,
chunk.window.after ? ' ' + chunk.window.after : '',
]
.filter(Boolean)
.join(''),
metadata: chunk.metadata,
}));
}
function cosineSimilarity(a: number[], b: number[]): number {
const dotProduct = a.reduce((sum, x, i) => sum + x * b[i], 0);
const normA = Math.sqrt(a.reduce((sum, x) => sum + x * x, 0));
const normB = Math.sqrt(b.reduce((sum, x) => sum + x * x, 0));
return dotProduct / (normA * normB);
}
Pros: Balances search granularity with context preservation Cons: More complex implementation, double storage
Parent-Child Chunking
Embed fine-grained chunks, but retrieve parent (coarse) chunks:
interface ParentChildChunks {
parentId: string;
parentText: string;
children: Array<{
childId: string;
text: string;
embedding: number[];
}>;
}
function parentChildChunking(
text: string,
parentSize: number = 2048,
childSize: number = 512,
overlap: number = 100
): ParentChildChunks[] {
const parents: ParentChildChunks[] = [];
const stride = parentSize - overlap;
for (let i = 0; i < text.length; i += stride) {
const endIdx = Math.min(i + parentSize, text.length);
const parentText = text.substring(i, endIdx);
if (parentText.length < 50) continue;
const parentId = `parent_${parents.length}`;
const children = [];
const childStride = childSize - (overlap / 2);
for (let j = 0; j < parentText.length; j += childStride) {
const childEndIdx = Math.min(j + childSize, parentText.length);
const childText = parentText.substring(j, childEndIdx);
if (childText.length > 20) {
children.push({
childId: `${parentId}_child_${children.length}`,
text: childText,
embedding: [], // Populate via embedModel
});
}
if (childEndIdx === parentText.length) break;
}
parents.push({
parentId,
parentText,
children,
});
if (endIdx === text.length) break;
}
return parents;
}
// Retrieval: search children, return parents
async function retrieveWithParentChild(
query: string,
parentChildStructure: ParentChildChunks[],
topK: number = 3
): Promise<string[]> {
const queryEmbedding = await embed(query);
const similarities: Array<{
parentId: string;
childId: string;
similarity: number;
}> = [];
for (const parent of parentChildStructure) {
for (const child of parent.children) {
const similarity = cosineSimilarity(queryEmbedding, child.embedding);
similarities.push({ parentId: parent.parentId, childId: child.childId, similarity });
}
}
// Get top-k by child similarity, but return parent text
const topChildren = similarities
.sort((a, b) => b.similarity - a.similarity)
.slice(0, topK);
const parentIds = new Set(topChildren.map(c => c.parentId));
const results: string[] = [];
for (const parentId of parentIds) {
const parent = parentChildStructure.find(p => p.parentId === parentId);
if (parent) results.push(parent.parentText);
}
return results;
}
Pros: Fine-grained search, coarse context Cons: Higher storage, complexity
Late Chunking
Embed entire documents, then chunk embeddings:
async function lateChunking(
documents: Array<{ id: string; text: string }>,
chunkSize: number = 10
): Promise<
Array<{
chunkId: string;
docId: string;
text: string;
embedding: number[];
}>
> {
const chunks = [];
let chunkCounter = 0;
for (const doc of documents) {
// Step 1: Get single embedding for entire document
const docEmbedding = await embed(doc.text);
// Step 2: Split document into logical chunks
const docChunks = doc.text
.split(/\n\n+/) // Split by paragraph
.filter(c => c.length > 50);
// Step 3: Assign shared embedding to all chunks
// In practice, you'd use weighted combination based on chunk positions
for (const chunkText of docChunks) {
chunks.push({
chunkId: `chunk_${chunkCounter++}`,
docId: doc.id,
text: chunkText,
embedding: docEmbedding, // Shared!
});
}
}
return chunks;
}
Pros: Lower embedding costs, document-level coherence Cons: Less granular search, all chunks share same embedding
Chunk Size Experiments
Track these metrics to optimize chunk size for your domain:
interface ChunkingMetrics {
avgChunkSize: number;
minChunkSize: number;
maxChunkSize: number;
totalChunks: number;
avgTokensPerChunk: number;
avgRetrievalRank: number; // How high is relevant chunk ranked
retrievalHitRate: number; // % of queries where relevant chunk is in top-5
}
async function evaluateChunkingStrategy(
chunks: string[],
testQueries: Array<{ query: string; relevantChunkIds: string[] }>,
embedding: (text: string) => Promise<number[]>,
tokenize: (text: string) => string[]
): Promise<ChunkingMetrics> {
const tokenCounts = chunks.map(c => tokenize(c).length);
const retrievalRanks: number[] = [];
const hits: boolean[] = [];
for (const test of testQueries) {
const queryEmbedding = await embedding(test.query);
const similarities = chunks.map((chunk, idx) => ({
idx,
similarity: cosineSimilarity(queryEmbedding, await embedding(chunk)),
}));
similarities.sort((a, b) => b.similarity - a.similarity);
for (const relevantId of test.relevantChunkIds) {
const rank = similarities.findIndex(s => s.idx === parseInt(relevantId));
if (rank !== -1) {
retrievalRanks.push(rank + 1);
hits.push(rank < 5);
}
}
}
return {
avgChunkSize: chunks.reduce((sum, c) => sum + c.length, 0) / chunks.length,
minChunkSize: Math.min(...chunks.map(c => c.length)),
maxChunkSize: Math.max(...chunks.map(c => c.length)),
totalChunks: chunks.length,
avgTokensPerChunk: tokenCounts.reduce((a, b) => a + b, 0) / tokenCounts.length,
avgRetrievalRank: retrievalRanks.reduce((a, b) => a + b, 0) / retrievalRanks.length,
retrievalHitRate: hits.filter(Boolean).length / hits.length,
};
}
Checklist
- Start with recursive character splitting for general documents
- Measure retrieval hit rate for your domain
- Consider semantic chunking for highly technical content
- Implement sentence-window retrieval for balanced context
- Use structure-aware chunking for markdown/PDFs
- Track chunk size distribution (target 512-1024 tokens)
- Test multiple overlap sizes (10-20% recommended)
- Evaluate reranking quality by chunk size
- Monitor embedding costs relative to retrieval quality gains
- A/B test chunking strategies on production queries
Conclusion
There's no universal best chunking strategy. The optimal approach depends on your document type, embedding model, and downstream task. Start with recursive splitting, measure hit rates, then progressively add semantic or structure-aware chunking. The key metric: does your retrieval system find relevant chunks in the top-k results? Everything else is implementation detail.