Published on

Hybrid Retrieval for RAG — Combining Dense and Sparse Search

Authors

Introduction

Vector search has become the default retrieval method for RAG. But semantic embeddings aren't perfect: they miss keyword matches, ignore exact terminology, and struggle with rare entities. Sparse retrieval methods like BM25 excel at the opposite: exact term matching but miss semantic relationships.

The answer? Hybrid retrieval that combines both.

Dense vs Sparse Retrieval Tradeoffs

Dense Retrieval (Embeddings)

Dense retrieval embeds queries and documents, then finds nearest neighbors:

interface DenseRetriever {
  embeddings: Record<string, number[]>; // chunkId -> embedding vector
  docTexts: Record<string, string>; // chunkId -> text
}

async function denseSearch(
  query: string,
  retriever: DenseRetriever,
  topK: number = 5
): Promise<Array<{ docId: string; score: number; text: string }>> {
  // Embed query
  const queryEmbedding = await embedModel.embed(query);

  // Compute similarity to all documents
  const similarities = Object.entries(retriever.embeddings).map(([docId, embedding]) => {
    const similarity = cosineSimilarity(queryEmbedding, embedding);
    return {
      docId,
      score: similarity,
      text: retriever.docTexts[docId],
    };
  });

  // Sort and return top-k
  return similarities
    .sort((a, b) => b.score - a.score)
    .slice(0, topK);
}

function cosineSimilarity(a: number[], b: number[]): number {
  const dotProduct = a.reduce((sum, x, i) => sum + x * b[i], 0);
  const normA = Math.sqrt(a.reduce((sum, x) => sum + x * x, 0));
  const normB = Math.sqrt(b.reduce((sum, x) => sum + x * x, 0));
  return dotProduct / (normA * normB);
}

Strengths:

  • Captures semantic relationships
  • Works across languages
  • Generalizes to unseen terms

Weaknesses:

  • Misses exact keyword matches
  • Struggles with proper nouns (names, dates)
  • Requires recomputation for new docs

Sparse Retrieval (BM25)

BM25 scores based on term frequency and inverse document frequency:

import Fuse from 'fuse.js';

interface BM25Config {
  k1: number; // controls term frequency saturation (typically 1.5)
  b: number; // controls length normalization (typically 0.75)
  idf: Record<string, number>; // pre-computed IDF values
}

interface Document {
  id: string;
  text: string;
  tokens: string[];
}

class BM25Retriever {
  private documents: Document[];
  private config: BM25Config;
  private avgDocLength: number;

  constructor(documents: Document[], config: Partial<BM25Config> = {}) {
    this.documents = documents;
    this.config = {
      k1: config.k1 ?? 1.5,
      b: config.b ?? 0.75,
      idf: config.idf ?? {},
    };

    // Compute average document length
    this.avgDocLength = documents.reduce((sum, doc) => sum + doc.tokens.length, 0) / documents.length;

    // Compute IDF if not provided
    if (Object.keys(this.config.idf).length === 0) {
      this.computeIDF();
    }
  }

  private computeIDF(): void {
    const docFreq: Record<string, number> = {};

    // Count documents containing each term
    for (const doc of this.documents) {
      const uniqueTokens = new Set(doc.tokens);
      for (const token of uniqueTokens) {
        docFreq[token] = (docFreq[token] || 0) + 1;
      }
    }

    // Compute IDF
    for (const token in docFreq) {
      const df = docFreq[token];
      this.config.idf[token] = Math.log(
        (this.documents.length - df + 0.5) / (df + 0.5) + 1
      );
    }
  }

  search(query: string, topK: number = 5): Array<{ docId: string; score: number; text: string }> {
    const queryTokens = this.tokenize(query);
    const scores: Record<string, number> = {};

    // Score each document
    for (const doc of this.documents) {
      let score = 0;

      for (const token of queryTokens) {
        const idf = this.config.idf[token] ?? 0;
        const termFreq = doc.tokens.filter(t => t === token).length;
        const docLen = doc.tokens.length;

        // BM25 formula
        const numerator = termFreq * (this.config.k1 + 1);
        const denominator = termFreq + this.config.k1 * (1 - this.config.b + this.config.b * (docLen / this.avgDocLength));

        score += idf * (numerator / denominator);
      }

      scores[doc.id] = score;
    }

    // Return top-k
    return Object.entries(scores)
      .sort(([, a], [, b]) => b - a)
      .slice(0, topK)
      .map(([docId, score]) => ({
        docId,
        score,
        text: this.documents.find(d => d.id === docId)!.text,
      }));
  }

  private tokenize(text: string): string[] {
    return text
      .toLowerCase()
      .split(/\s+/)
      .map(token => token.replace(/[^\w]/g, ''))
      .filter(token => token.length > 0);
  }
}

Strengths:

  • Excellent for keyword matching
  • Fast, no embedding costs
  • Handles proper nouns well
  • Deterministic

Weaknesses:

  • Misses synonyms
  • Language-dependent
  • Sensitive to query phrasing

Hybrid Search Combining Dense + Sparse

Reciprocal Rank Fusion (RRF)

Combine rankings from dense and sparse retrievers using RRF:

interface RetrievalResult {
  docId: string;
  score: number;
  text: string;
}

function reciprocalRankFusion(
  denseResults: RetrievalResult[],
  sparseResults: RetrievalResult[],
  k: number = 60 // RRF parameter
): RetrievalResult[] {
  const rrfScores: Record<string, number> = {};

  // Score dense results
  for (let rank = 0; rank < denseResults.length; rank++) {
    const docId = denseResults[rank].docId;
    rrfScores[docId] = (rrfScores[docId] || 0) + 1 / (k + rank + 1);
  }

  // Score sparse results
  for (let rank = 0; rank < sparseResults.length; rank++) {
    const docId = sparseResults[rank].docId;
    rrfScores[docId] = (rrfScores[docId] || 0) + 1 / (k + rank + 1);
  }

  // Combine and sort
  const allDocIds = new Set([
    ...denseResults.map(r => r.docId),
    ...sparseResults.map(r => r.docId),
  ]);

  const combined = Array.from(allDocIds)
    .map(docId => ({
      docId,
      rrfScore: rrfScores[docId],
    }))
    .sort((a, b) => b.rrfScore - a.rrfScore);

  // Return combined results with original text
  const textMap = new Map([
    ...denseResults.map(r => [r.docId, r.text] as const),
    ...sparseResults.map(r => [r.docId, r.text] as const),
  ]);

  return combined.map(({ docId, rrfScore }) => ({
    docId,
    score: rrfScore,
    text: textMap.get(docId)!,
  }));
}

// Usage
async function hybridSearch(
  query: string,
  denseRetriever: DenseRetriever,
  sparseRetriever: BM25Retriever,
  topK: number = 5
): Promise<RetrievalResult[]> {
  const [denseResults, sparseResults] = await Promise.all([
    denseSearch(query, denseRetriever, topK * 2), // Get more for RRF
    Promise.resolve(sparseRetriever.search(query, topK * 2)),
  ]);

  return reciprocalRankFusion(denseResults, sparseResults).slice(0, topK);
}

Weighted Combination

Blend dense and sparse scores with learned weights:

interface HybridScores {
  dense: number; // 0-1
  sparse: number; // 0-1
  combined: number;
}

async function weightedHybridSearch(
  query: string,
  denseRetriever: DenseRetriever,
  sparseRetriever: BM25Retriever,
  weights: { dense: number; sparse: number } = { dense: 0.6, sparse: 0.4 },
  topK: number = 5
): Promise<Array<{ docId: string; scores: HybridScores; text: string }>> {
  const denseResults = await denseSearch(query, denseRetriever, topK * 2);
  const sparseResults = sparseRetriever.search(query, topK * 2);

  // Normalize scores to 0-1
  const maxDenseScore = Math.max(...denseResults.map(r => r.score));
  const maxSparseScore = Math.max(...sparseResults.map(r => r.score));

  const normalizedDense = new Map(
    denseResults.map(r => [r.docId, r.score / maxDenseScore])
  );
  const normalizedSparse = new Map(
    sparseResults.map(r => [r.docId, r.score / maxSparseScore])
  );

  // Combine
  const allDocIds = new Set([
    ...denseResults.map(r => r.docId),
    ...sparseResults.map(r => r.docId),
  ]);

  const combined = Array.from(allDocIds)
    .map(docId => ({
      docId,
      scores: {
        dense: normalizedDense.get(docId) ?? 0,
        sparse: normalizedSparse.get(docId) ?? 0,
        combined: (normalizedDense.get(docId) ?? 0) * weights.dense +
                  (normalizedSparse.get(docId) ?? 0) * weights.sparse,
      },
      text: denseResults.find(r => r.docId === docId)?.text ||
            sparseResults.find(r => r.docId === docId)!.text,
    }))
    .sort((a, b) => b.scores.combined - a.scores.combined);

  return combined.slice(0, topK);
}

Query-Type Aware Routing

Route queries to the most appropriate retriever:

type QueryType = 'keyword' | 'semantic' | 'hybrid';

async function classifyQuery(query: string): Promise<QueryType> {
  // Heuristic: check for quotes, numbers, special terms
  const hasQuotes = query.includes('"');
  const hasNumbers = /\d{4,}/.test(query); // dates, IDs
  const keywordIndicators = /\b(ID|code|name|exactly|specific|exact)\b/i.test(query);

  if (hasQuotes || hasNumbers || keywordIndicators) {
    return 'keyword';
  }

  // Use LLM for ambiguous cases
  const classifyPrompt = `
Is this query asking for:
1. Exact matches / keywords / entities
2. Semantic understanding / concepts / relationships
3. Mixed (both exact and semantic)

Query: "${query}"

Respond with: keyword, semantic, or hybrid`;

  const response = await llm.generate({
    messages: [{ role: 'user', content: classifyPrompt }],
    maxTokens: 20,
  });

  const answer = response.text.toLowerCase();
  if (answer.includes('keyword')) return 'keyword';
  if (answer.includes('semantic')) return 'semantic';
  return 'hybrid';
}

async function routedHybridSearch(
  query: string,
  denseRetriever: DenseRetriever,
  sparseRetriever: BM25Retriever,
  topK: number = 5
): Promise<RetrievalResult[]> {
  const queryType = await classifyQuery(query);

  switch (queryType) {
    case 'keyword':
      return sparseRetriever.search(query, topK).map(r => ({
        ...r,
        score: r.score * 1.2, // Boost confidence
      }));

    case 'semantic':
      return denseSearch(query, denseRetriever, topK).map(r => ({
        ...r,
        score: r.score * 1.2,
      }));

    case 'hybrid':
      return hybridSearch(query, denseRetriever, sparseRetriever, topK);
  }
}

Evaluation Metrics

Track retrieval quality with standard IR metrics:

interface RetrievalEvaluationMetrics {
  ndcg: number; // Normalized Discounted Cumulative Gain (0-1)
  map: number; // Mean Average Precision (0-1)
  hitRate: number; // % queries with relevant doc in top-k
  mrr: number; // Mean Reciprocal Rank
  precision: number; // Relevant items / retrieved items
  recall: number; // Relevant items retrieved / total relevant items
}

function computeNDCG(
  rankedDocs: string[], // IDs of retrieved documents
  relevantDocs: Set<string>,
  k: number = 5
): number {
  const dcg = rankedDocs
    .slice(0, k)
    .reduce((sum, docId, i) => {
      const relevance = relevantDocs.has(docId) ? 1 : 0;
      return sum + relevance / Math.log2(i + 2); // +2 because i is 0-indexed
    }, 0);

  // Ideal ranking: all relevant docs first
  const idcg = Array(Math.min(k, relevantDocs.size))
    .fill(1)
    .reduce((sum, _, i) => sum + 1 / Math.log2(i + 2), 0);

  return idcg === 0 ? 0 : dcg / idcg;
}

function evaluateRetrieval(
  results: Array<{ query: string; retrieved: string[]; relevant: Set<string> }>,
  k: number = 5
): RetrievalEvaluationMetrics {
  const metrics = results.map(({ retrieved, relevant }) => {
    const ndcg = computeNDCG(retrieved, relevant, k);
    const topK = retrieved.slice(0, k);
    const hitRate = topK.some(docId => relevant.has(docId)) ? 1 : 0;
    const precision = topK.filter(docId => relevant.has(docId)).length / topK.length;
    const recall = topK.filter(docId => relevant.has(docId)).length / relevant.size;

    // MRR: 1 / rank of first relevant
    const firstRelevantRank = topK.findIndex(docId => relevant.has(docId));
    const mrr = firstRelevantRank === -1 ? 0 : 1 / (firstRelevantRank + 1);

    // MAP: Average precision at each relevant position
    let apSum = 0;
    let relevantCount = 0;
    for (let i = 0; i < topK.length; i++) {
      if (relevant.has(topK[i])) {
        relevantCount++;
        apSum += relevantCount / (i + 1);
      }
    }
    const ap = relevantCount === 0 ? 0 : apSum / Math.min(k, relevant.size);

    return { ndcg, hitRate, precision, recall, mrr, ap };
  });

  return {
    ndcg: metrics.reduce((sum, m) => sum + m.ndcg, 0) / metrics.length,
    hitRate: metrics.reduce((sum, m) => sum + m.hitRate, 0) / metrics.length,
    precision: metrics.reduce((sum, m) => sum + m.precision, 0) / metrics.length,
    recall: metrics.reduce((sum, m) => sum + m.recall, 0) / metrics.length,
    mrr: metrics.reduce((sum, m) => sum + m.mrr, 0) / metrics.length,
    map: metrics.reduce((sum, m) => sum + m.ap, 0) / metrics.length,
  };
}

Tuning Dense/Sparse Alpha

Find optimal weight balance for your domain:

async function tuneHybridWeights(
  testQueries: Array<{ query: string; relevant: Set<string> }>,
  denseRetriever: DenseRetriever,
  sparseRetriever: BM25Retriever
): Promise<{ denseWeight: number; sparseWeight: number; ndcg: number }> {
  const alphas = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9];
  let bestAlpha = 0.5;
  let bestNDCG = 0;

  for (const alpha of alphas) {
    const results = [];

    for (const { query, relevant } of testQueries) {
      const combined = await weightedHybridSearch(query, denseRetriever, sparseRetriever, {
        dense: alpha,
        sparse: 1 - alpha,
      });

      results.push({
        query,
        retrieved: combined.map(r => r.docId),
        relevant,
      });
    }

    const metrics = evaluateRetrieval(results);

    if (metrics.ndcg > bestNDCG) {
      bestNDCG = metrics.ndcg;
      bestAlpha = alpha;
    }
  }

  return {
    denseWeight: bestAlpha,
    sparseWeight: 1 - bestAlpha,
    ndcg: bestNDCG,
  };
}

Checklist

  • Implement both dense and sparse retrievers
  • Test RRF fusion on your dataset
  • Evaluate NDCG@5 and MRR for baseline comparison
  • Implement query-type classification
  • Route semantic vs keyword queries appropriately
  • Tune alpha weights on validation set
  • Monitor retrieval latency (sparse should be <10ms)
  • Track embedding costs vs quality improvement
  • Set up A/B test between pure dense vs hybrid
  • Build golden dataset of relevant doc pairs

Conclusion

Hybrid retrieval is not optional in production RAG. The combination of semantic understanding and keyword matching provides coverage neither approach alone can achieve. Start with RRF fusion (simplest), measure hit rates, then progressively tune weights and add query routing. Your retrieval quality improvements will compound through the entire pipeline.