- Published on
Hybrid Retrieval for RAG — Combining Dense and Sparse Search
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
Vector search has become the default retrieval method for RAG. But semantic embeddings aren't perfect: they miss keyword matches, ignore exact terminology, and struggle with rare entities. Sparse retrieval methods like BM25 excel at the opposite: exact term matching but miss semantic relationships.
The answer? Hybrid retrieval that combines both.
- Dense vs Sparse Retrieval Tradeoffs
- Dense Retrieval (Embeddings)
- Sparse Retrieval (BM25)
- Hybrid Search Combining Dense + Sparse
- Reciprocal Rank Fusion (RRF)
- Weighted Combination
- Query-Type Aware Routing
- Evaluation Metrics
- Tuning Dense/Sparse Alpha
- Checklist
- Conclusion
Dense vs Sparse Retrieval Tradeoffs
Dense Retrieval (Embeddings)
Dense retrieval embeds queries and documents, then finds nearest neighbors:
interface DenseRetriever {
embeddings: Record<string, number[]>; // chunkId -> embedding vector
docTexts: Record<string, string>; // chunkId -> text
}
async function denseSearch(
query: string,
retriever: DenseRetriever,
topK: number = 5
): Promise<Array<{ docId: string; score: number; text: string }>> {
// Embed query
const queryEmbedding = await embedModel.embed(query);
// Compute similarity to all documents
const similarities = Object.entries(retriever.embeddings).map(([docId, embedding]) => {
const similarity = cosineSimilarity(queryEmbedding, embedding);
return {
docId,
score: similarity,
text: retriever.docTexts[docId],
};
});
// Sort and return top-k
return similarities
.sort((a, b) => b.score - a.score)
.slice(0, topK);
}
function cosineSimilarity(a: number[], b: number[]): number {
const dotProduct = a.reduce((sum, x, i) => sum + x * b[i], 0);
const normA = Math.sqrt(a.reduce((sum, x) => sum + x * x, 0));
const normB = Math.sqrt(b.reduce((sum, x) => sum + x * x, 0));
return dotProduct / (normA * normB);
}
Strengths:
- Captures semantic relationships
- Works across languages
- Generalizes to unseen terms
Weaknesses:
- Misses exact keyword matches
- Struggles with proper nouns (names, dates)
- Requires recomputation for new docs
Sparse Retrieval (BM25)
BM25 scores based on term frequency and inverse document frequency:
import Fuse from 'fuse.js';
interface BM25Config {
k1: number; // controls term frequency saturation (typically 1.5)
b: number; // controls length normalization (typically 0.75)
idf: Record<string, number>; // pre-computed IDF values
}
interface Document {
id: string;
text: string;
tokens: string[];
}
class BM25Retriever {
private documents: Document[];
private config: BM25Config;
private avgDocLength: number;
constructor(documents: Document[], config: Partial<BM25Config> = {}) {
this.documents = documents;
this.config = {
k1: config.k1 ?? 1.5,
b: config.b ?? 0.75,
idf: config.idf ?? {},
};
// Compute average document length
this.avgDocLength = documents.reduce((sum, doc) => sum + doc.tokens.length, 0) / documents.length;
// Compute IDF if not provided
if (Object.keys(this.config.idf).length === 0) {
this.computeIDF();
}
}
private computeIDF(): void {
const docFreq: Record<string, number> = {};
// Count documents containing each term
for (const doc of this.documents) {
const uniqueTokens = new Set(doc.tokens);
for (const token of uniqueTokens) {
docFreq[token] = (docFreq[token] || 0) + 1;
}
}
// Compute IDF
for (const token in docFreq) {
const df = docFreq[token];
this.config.idf[token] = Math.log(
(this.documents.length - df + 0.5) / (df + 0.5) + 1
);
}
}
search(query: string, topK: number = 5): Array<{ docId: string; score: number; text: string }> {
const queryTokens = this.tokenize(query);
const scores: Record<string, number> = {};
// Score each document
for (const doc of this.documents) {
let score = 0;
for (const token of queryTokens) {
const idf = this.config.idf[token] ?? 0;
const termFreq = doc.tokens.filter(t => t === token).length;
const docLen = doc.tokens.length;
// BM25 formula
const numerator = termFreq * (this.config.k1 + 1);
const denominator = termFreq + this.config.k1 * (1 - this.config.b + this.config.b * (docLen / this.avgDocLength));
score += idf * (numerator / denominator);
}
scores[doc.id] = score;
}
// Return top-k
return Object.entries(scores)
.sort(([, a], [, b]) => b - a)
.slice(0, topK)
.map(([docId, score]) => ({
docId,
score,
text: this.documents.find(d => d.id === docId)!.text,
}));
}
private tokenize(text: string): string[] {
return text
.toLowerCase()
.split(/\s+/)
.map(token => token.replace(/[^\w]/g, ''))
.filter(token => token.length > 0);
}
}
Strengths:
- Excellent for keyword matching
- Fast, no embedding costs
- Handles proper nouns well
- Deterministic
Weaknesses:
- Misses synonyms
- Language-dependent
- Sensitive to query phrasing
Hybrid Search Combining Dense + Sparse
Reciprocal Rank Fusion (RRF)
Combine rankings from dense and sparse retrievers using RRF:
interface RetrievalResult {
docId: string;
score: number;
text: string;
}
function reciprocalRankFusion(
denseResults: RetrievalResult[],
sparseResults: RetrievalResult[],
k: number = 60 // RRF parameter
): RetrievalResult[] {
const rrfScores: Record<string, number> = {};
// Score dense results
for (let rank = 0; rank < denseResults.length; rank++) {
const docId = denseResults[rank].docId;
rrfScores[docId] = (rrfScores[docId] || 0) + 1 / (k + rank + 1);
}
// Score sparse results
for (let rank = 0; rank < sparseResults.length; rank++) {
const docId = sparseResults[rank].docId;
rrfScores[docId] = (rrfScores[docId] || 0) + 1 / (k + rank + 1);
}
// Combine and sort
const allDocIds = new Set([
...denseResults.map(r => r.docId),
...sparseResults.map(r => r.docId),
]);
const combined = Array.from(allDocIds)
.map(docId => ({
docId,
rrfScore: rrfScores[docId],
}))
.sort((a, b) => b.rrfScore - a.rrfScore);
// Return combined results with original text
const textMap = new Map([
...denseResults.map(r => [r.docId, r.text] as const),
...sparseResults.map(r => [r.docId, r.text] as const),
]);
return combined.map(({ docId, rrfScore }) => ({
docId,
score: rrfScore,
text: textMap.get(docId)!,
}));
}
// Usage
async function hybridSearch(
query: string,
denseRetriever: DenseRetriever,
sparseRetriever: BM25Retriever,
topK: number = 5
): Promise<RetrievalResult[]> {
const [denseResults, sparseResults] = await Promise.all([
denseSearch(query, denseRetriever, topK * 2), // Get more for RRF
Promise.resolve(sparseRetriever.search(query, topK * 2)),
]);
return reciprocalRankFusion(denseResults, sparseResults).slice(0, topK);
}
Weighted Combination
Blend dense and sparse scores with learned weights:
interface HybridScores {
dense: number; // 0-1
sparse: number; // 0-1
combined: number;
}
async function weightedHybridSearch(
query: string,
denseRetriever: DenseRetriever,
sparseRetriever: BM25Retriever,
weights: { dense: number; sparse: number } = { dense: 0.6, sparse: 0.4 },
topK: number = 5
): Promise<Array<{ docId: string; scores: HybridScores; text: string }>> {
const denseResults = await denseSearch(query, denseRetriever, topK * 2);
const sparseResults = sparseRetriever.search(query, topK * 2);
// Normalize scores to 0-1
const maxDenseScore = Math.max(...denseResults.map(r => r.score));
const maxSparseScore = Math.max(...sparseResults.map(r => r.score));
const normalizedDense = new Map(
denseResults.map(r => [r.docId, r.score / maxDenseScore])
);
const normalizedSparse = new Map(
sparseResults.map(r => [r.docId, r.score / maxSparseScore])
);
// Combine
const allDocIds = new Set([
...denseResults.map(r => r.docId),
...sparseResults.map(r => r.docId),
]);
const combined = Array.from(allDocIds)
.map(docId => ({
docId,
scores: {
dense: normalizedDense.get(docId) ?? 0,
sparse: normalizedSparse.get(docId) ?? 0,
combined: (normalizedDense.get(docId) ?? 0) * weights.dense +
(normalizedSparse.get(docId) ?? 0) * weights.sparse,
},
text: denseResults.find(r => r.docId === docId)?.text ||
sparseResults.find(r => r.docId === docId)!.text,
}))
.sort((a, b) => b.scores.combined - a.scores.combined);
return combined.slice(0, topK);
}
Query-Type Aware Routing
Route queries to the most appropriate retriever:
type QueryType = 'keyword' | 'semantic' | 'hybrid';
async function classifyQuery(query: string): Promise<QueryType> {
// Heuristic: check for quotes, numbers, special terms
const hasQuotes = query.includes('"');
const hasNumbers = /\d{4,}/.test(query); // dates, IDs
const keywordIndicators = /\b(ID|code|name|exactly|specific|exact)\b/i.test(query);
if (hasQuotes || hasNumbers || keywordIndicators) {
return 'keyword';
}
// Use LLM for ambiguous cases
const classifyPrompt = `
Is this query asking for:
1. Exact matches / keywords / entities
2. Semantic understanding / concepts / relationships
3. Mixed (both exact and semantic)
Query: "${query}"
Respond with: keyword, semantic, or hybrid`;
const response = await llm.generate({
messages: [{ role: 'user', content: classifyPrompt }],
maxTokens: 20,
});
const answer = response.text.toLowerCase();
if (answer.includes('keyword')) return 'keyword';
if (answer.includes('semantic')) return 'semantic';
return 'hybrid';
}
async function routedHybridSearch(
query: string,
denseRetriever: DenseRetriever,
sparseRetriever: BM25Retriever,
topK: number = 5
): Promise<RetrievalResult[]> {
const queryType = await classifyQuery(query);
switch (queryType) {
case 'keyword':
return sparseRetriever.search(query, topK).map(r => ({
...r,
score: r.score * 1.2, // Boost confidence
}));
case 'semantic':
return denseSearch(query, denseRetriever, topK).map(r => ({
...r,
score: r.score * 1.2,
}));
case 'hybrid':
return hybridSearch(query, denseRetriever, sparseRetriever, topK);
}
}
Evaluation Metrics
Track retrieval quality with standard IR metrics:
interface RetrievalEvaluationMetrics {
ndcg: number; // Normalized Discounted Cumulative Gain (0-1)
map: number; // Mean Average Precision (0-1)
hitRate: number; // % queries with relevant doc in top-k
mrr: number; // Mean Reciprocal Rank
precision: number; // Relevant items / retrieved items
recall: number; // Relevant items retrieved / total relevant items
}
function computeNDCG(
rankedDocs: string[], // IDs of retrieved documents
relevantDocs: Set<string>,
k: number = 5
): number {
const dcg = rankedDocs
.slice(0, k)
.reduce((sum, docId, i) => {
const relevance = relevantDocs.has(docId) ? 1 : 0;
return sum + relevance / Math.log2(i + 2); // +2 because i is 0-indexed
}, 0);
// Ideal ranking: all relevant docs first
const idcg = Array(Math.min(k, relevantDocs.size))
.fill(1)
.reduce((sum, _, i) => sum + 1 / Math.log2(i + 2), 0);
return idcg === 0 ? 0 : dcg / idcg;
}
function evaluateRetrieval(
results: Array<{ query: string; retrieved: string[]; relevant: Set<string> }>,
k: number = 5
): RetrievalEvaluationMetrics {
const metrics = results.map(({ retrieved, relevant }) => {
const ndcg = computeNDCG(retrieved, relevant, k);
const topK = retrieved.slice(0, k);
const hitRate = topK.some(docId => relevant.has(docId)) ? 1 : 0;
const precision = topK.filter(docId => relevant.has(docId)).length / topK.length;
const recall = topK.filter(docId => relevant.has(docId)).length / relevant.size;
// MRR: 1 / rank of first relevant
const firstRelevantRank = topK.findIndex(docId => relevant.has(docId));
const mrr = firstRelevantRank === -1 ? 0 : 1 / (firstRelevantRank + 1);
// MAP: Average precision at each relevant position
let apSum = 0;
let relevantCount = 0;
for (let i = 0; i < topK.length; i++) {
if (relevant.has(topK[i])) {
relevantCount++;
apSum += relevantCount / (i + 1);
}
}
const ap = relevantCount === 0 ? 0 : apSum / Math.min(k, relevant.size);
return { ndcg, hitRate, precision, recall, mrr, ap };
});
return {
ndcg: metrics.reduce((sum, m) => sum + m.ndcg, 0) / metrics.length,
hitRate: metrics.reduce((sum, m) => sum + m.hitRate, 0) / metrics.length,
precision: metrics.reduce((sum, m) => sum + m.precision, 0) / metrics.length,
recall: metrics.reduce((sum, m) => sum + m.recall, 0) / metrics.length,
mrr: metrics.reduce((sum, m) => sum + m.mrr, 0) / metrics.length,
map: metrics.reduce((sum, m) => sum + m.ap, 0) / metrics.length,
};
}
Tuning Dense/Sparse Alpha
Find optimal weight balance for your domain:
async function tuneHybridWeights(
testQueries: Array<{ query: string; relevant: Set<string> }>,
denseRetriever: DenseRetriever,
sparseRetriever: BM25Retriever
): Promise<{ denseWeight: number; sparseWeight: number; ndcg: number }> {
const alphas = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9];
let bestAlpha = 0.5;
let bestNDCG = 0;
for (const alpha of alphas) {
const results = [];
for (const { query, relevant } of testQueries) {
const combined = await weightedHybridSearch(query, denseRetriever, sparseRetriever, {
dense: alpha,
sparse: 1 - alpha,
});
results.push({
query,
retrieved: combined.map(r => r.docId),
relevant,
});
}
const metrics = evaluateRetrieval(results);
if (metrics.ndcg > bestNDCG) {
bestNDCG = metrics.ndcg;
bestAlpha = alpha;
}
}
return {
denseWeight: bestAlpha,
sparseWeight: 1 - bestAlpha,
ndcg: bestNDCG,
};
}
Checklist
- Implement both dense and sparse retrievers
- Test RRF fusion on your dataset
- Evaluate NDCG@5 and MRR for baseline comparison
- Implement query-type classification
- Route semantic vs keyword queries appropriately
- Tune alpha weights on validation set
- Monitor retrieval latency (sparse should be <10ms)
- Track embedding costs vs quality improvement
- Set up A/B test between pure dense vs hybrid
- Build golden dataset of relevant doc pairs
Conclusion
Hybrid retrieval is not optional in production RAG. The combination of semantic understanding and keyword matching provides coverage neither approach alone can achieve. Start with RRF fusion (simplest), measure hit rates, then progressively tune weights and add query routing. Your retrieval quality improvements will compound through the entire pipeline.