Published on

Dense Passage Retrieval in Production — Training and Deploying DPR Models

Authors

Introduction

Off-the-shelf embeddings (OpenAI, Cohere) work well for general tasks but often underperform on domain-specific queries. Dense Passage Retrieval (DPR) lets you train custom dual encoders for questions and passages, achieving superior performance on specialized domains. A production DPR system requires careful training, evaluation, and deployment at scale.

DPR Architecture: Question and Passage Encoders

DPR trains two separate encoders: one for questions, one for passages.

interface DPRModel {
  questionEncoder: TextEncoder; // Transforms questions to 768-dim vectors
  passageEncoder: TextEncoder; // Transforms passages to 768-dim vectors
}

async function retrieveWithDPR(
  question: string,
  passages: string[],
  model: DPRModel
): Promise<{ passage: string; score: number }[]> {
  // Encode question once
  const questionEmbedding = await model.questionEncoder.encode(question);

  // Encode all passages
  const passageEmbeddings = await Promise.all(
    passages.map(p => model.passageEncoder.encode(p))
  );

  // Score each passage
  const scores = passageEmbeddings.map((emb, idx) => ({
    passage: passages[idx],
    score: dotProduct(questionEmbedding, emb),
  }));

  // Sort by score descending
  return scores.sort((a, b) => b.score - a.score);
}

The key insight: separate encoders allow the model to specialize. A question encoder learns to embed "What is photosynthesis?" as a query concept. A passage encoder learns to represent paragraphs about photosynthesis as retrievable targets.

Bi-Encoder vs Cross-Encoder Trade-Off

Bi-encoders (DPR) are asymmetric and fast. Cross-encoders are symmetric but slower.

Bi-Encoder (DPR):

  • Encode question once: O(1)
  • Encode each of 1M passages: O(M)
  • Total: O(M) → ~100ms for 1M passages

Cross-Encoder:

  • Encode question-passage pairs: O(M) × O(q + p) → ~10s for 1M passages

Typical production strategy: use DPR for candidate retrieval (top-100), then re-rank with cross-encoder.

async function hybridRetrieval(
  question: string,
  allPassages: string[],
  dprModel: DPRModel,
  crossEncoderModel: CrossEncoderModel
): Promise<string[]> {
  // Step 1: DPR candidate retrieval (fast, approximate)
  const candidates = await retrieveWithDPR(question, allPassages, dprModel);
  const topCandidates = candidates.slice(0, 100);

  // Step 2: Cross-encoder re-ranking (slower, more accurate)
  const reranked = await crossEncoderModel.score(
    question,
    topCandidates.map(c => c.passage)
  );

  return reranked
    .sort((a, b) => b.score - a.score)
    .slice(0, 10)
    .map(r => r.passage);
}

This two-stage approach balances speed and accuracy.

Training DPR With In-Batch Negatives

DPR training uses contrastive learning: maximize similarity between question and correct passage, minimize with negatives.

interface TrainingBatch {
  questions: string[];
  positivePassages: string[];
  negativePassages: string[][]; // Multiple negatives per question
}

async function trainDPR(batches: TrainingBatch[]): Promise<void> {
  const optimizer = new AdamOptimizer({ lr: 2e-5 });

  for (const batch of batches) {
    // Encode questions
    const q = await questionEncoder.encode(batch.questions);

    // Encode positive passages
    const p_pos = await passageEncoder.encode(batch.positivePassages);

    // Encode negative passages
    const p_neg = await passageEncoder.encode(batch.negativePassages.flat());

    // Compute similarity scores
    const posScores = cosineSimilarity(q, p_pos); // shape: [batchSize]
    const negScores = cosineSimilarity(q, p_neg); // shape: [batchSize, numNegatives]

    // In-batch negatives: use other questions' passages as negatives
    const inBatchNegScores = matmul(q, p_pos.T); // [batchSize, batchSize]

    // Contrastive loss: maximize positive, minimize negative
    const loss = computeNTXentLoss(posScores, negScores, inBatchNegScores);

    optimizer.zero();
    loss.backward();
    optimizer.step();
  }
}

In-batch negatives (reusing other questions' passages as negatives) reduce memory without hurting performance.

Hard Negative Mining

Improve training efficiency by focusing on hard negatives (passages with high similarity but wrong answer).

async function hardNegativeMining(
  questions: string[],
  passages: string[],
  labels: Map<string, Set<number>>, // question -> correct passage indices
  dprModel: DPRModel,
  topK: number = 50
): Promise<TrainingBatch> {
  const batch: TrainingBatch = {
    questions,
    positivePassages: [],
    negativePassages: [],
  };

  for (const question of questions) {
    const qEmbedding = await dprModel.questionEncoder.encode(question);

    // Retrieve top-K passages
    const topPassages = await dprModel.retrieveTopK(qEmbedding, passages, topK);

    // Filter: keep negatives (correct labels NOT in retrieved set)
    const correctIndices = labels.get(question) ?? new Set();
    const hardNegatives = topPassages
      .filter(p => !correctIndices.has(p.index))
      .map(p => p.passage);

    const positiveIndices = Array.from(correctIndices);
    const positive = positiveIndices
      .map(idx => passages[idx])
      .slice(0, 1); // One positive per question

    batch.positivePassages.push(...positive);
    batch.negativePassages.push(hardNegatives);
  }

  return batch;
}

Hard negative mining focuses training on challenging cases, improving final performance by 5-10%.

FAISS Index for Billion-Scale Retrieval

Scale DPR to billions of passages using Facebook AI Similarity Search (FAISS).

import faiss from 'faiss-node';

async function indexPassagesWithFAISS(
  passages: string[],
  dprModel: DPRModel,
  indexPath: string
): Promise<void> {
  const embeddingDim = 768; // DPR embedding size

  // Create index: IVFFlat with 1000 clusters
  const quantizer = new faiss.IndexFlatL2(embeddingDim);
  const index = new faiss.IndexIVFFlat(quantizer, embeddingDim, 1000);

  // Encode all passages in batches
  const batchSize = 1000;
  for (let i = 0; i < passages.length; i += batchSize) {
    const batch = passages.slice(i, i + batchSize);
    const embeddings = await Promise.all(
      batch.map(p => dprModel.passageEncoder.encode(p))
    );

    // Stack embeddings into a matrix
    const matrix = new faiss.FloatVector(embeddings.flat());
    index.add(matrix);

    console.log(`Indexed ${i + batch.length} / ${passages.length}`);
  }

  // Train index (required for IVFFlat)
  if (!index.is_trained) {
    console.log('Training index...');
    index.train(passages.length);
  }

  // Save to disk
  faiss.writeIndex(index, indexPath);
  console.log(`Index saved to ${indexPath}`);
}

async function retrieveFromFAISS(
  question: string,
  dprModel: DPRModel,
  indexPath: string,
  passages: string[],
  topK: number = 10
): Promise<string[]> {
  // Load index
  const index = faiss.readIndex(indexPath);

  // Encode question
  const qEmbedding = await dprModel.questionEncoder.encode(question);
  const qMatrix = new faiss.FloatVector([qEmbedding]);

  // Search
  const distances = new faiss.LongVector();
  const labels = new faiss.LongVector();
  index.search(qMatrix, topK, distances, labels);

  // Map indices to passages
  return Array.from(labels).map(idx => passages[idx]);
}

FAISS enables retrieval from billions of passages in milliseconds.

DPR vs BM25 on Domain-Specific Data

On general corpora, BM25 (traditional keyword search) and DPR perform similarly. On specialized domains, DPR wins.

interface RetrievalBenchmark {
  query: string;
  goldPassages: string[];
}

async function benchmarkDPRvsBM25(
  benchmarks: RetrievalBenchmark[],
  corpus: string[],
  dprModel: DPRModel,
  bm25Index: BM25Index
): Promise<{ dprRecall: number; bm25Recall: number }> {
  let dprHits = 0;
  let bm25Hits = 0;

  for (const benchmark of benchmarks) {
    // DPR retrieval
    const dprResults = await retrieveWithDPR(benchmark.query, corpus, dprModel);
    const dprTop10 = dprResults.slice(0, 10).map(r => r.passage);

    // BM25 retrieval
    const bm25Results = bm25Index.search(benchmark.query, 10);

    // Check if gold passages found in top-10
    for (const gold of benchmark.goldPassages) {
      if (dprTop10.includes(gold)) dprHits++;
      if (bm25Results.includes(gold)) bm25Hits++;
    }
  }

  return {
    dprRecall: dprHits / (benchmarks.length * 10),
    bm25Recall: bm25Hits / (benchmarks.length * 10),
  };
}

Typical results on domain data:

  • BM25: 45% recall
  • DPR (off-the-shelf): 65% recall
  • DPR (fine-tuned on domain): 75-85% recall

Fine-Tuning DPR on Your Corpus

Start with a pre-trained DPR model (from Hugging Face) and fine-tune on your domain data.

async function fineTuneDPR(
  trainingData: Array<{ question: string; answer: string }>,
  pretrainedModelPath: string,
  outputPath: string
): Promise<void> {
  // Load pre-trained weights
  const questionEncoder = await loadModel(
    `${pretrainedModelPath}/question-encoder`
  );
  const passageEncoder = await loadModel(
    `${pretrainedModelPath}/passage-encoder`
  );

  const optimizer = new AdamOptimizer({ lr: 2e-5 });

  // Fine-tune on domain data (10 epochs typical)
  for (let epoch = 0; epoch < 10; epoch++) {
    let totalLoss = 0;

    for (const { question, answer } of trainingData) {
      const batch = createBatchWithNegatives(question, answer, trainingData);
      const loss = await computeContrastiveLoss(
        batch,
        questionEncoder,
        passageEncoder
      );

      totalLoss += loss;
      optimizer.zero();
      loss.backward();
      optimizer.step();
    }

    console.log(`Epoch ${epoch + 1}: loss = ${(totalLoss / trainingData.length).toFixed(4)}`);
  }

  // Save fine-tuned models
  await questionEncoder.save(`${outputPath}/question-encoder`);
  await passageEncoder.save(`${outputPath}/passage-encoder`);
}

With just 500-1000 domain examples, you can dramatically improve DPR performance.

Serving DPR in Production

Optimize for latency-sensitive retrieval.

class DPRRetrievalServer {
  private dprModel: DPRModel;
  private faissIndex: FAISSIndex;
  private passages: string[];
  private cache = new Map<string, string[]>();

  async initialize(modelPath: string, indexPath: string): Promise<void> {
    this.dprModel = await loadDPRModel(modelPath);
    this.faissIndex = faiss.readIndex(indexPath);
  }

  async retrieve(
    question: string,
    topK: number = 10,
    useCache: boolean = true
  ): Promise<string[]> {
    // Check cache
    if (useCache && this.cache.has(question)) {
      return this.cache.get(question) ?? [];
    }

    // Encode question
    const start = Date.now();
    const embedding = await this.dprModel.questionEncoder.encode(question);
    const encodeTime = Date.now() - start;

    // Search FAISS index
    const searchStart = Date.now();
    const indices = this.faissIndex.search(embedding, topK);
    const searchTime = Date.now() - searchStart;

    const results = indices.map(idx => this.passages[idx]);

    // Cache result
    this.cache.set(question, results);

    console.log(
      `DPR retrieval: ${encodeTime}ms encode + ${searchTime}ms search`
    );

    return results;
  }
}

On modern hardware, DPR retrieval from 1B passages: ~10ms question encoding + ~30ms FAISS search = ~40ms end-to-end.

When DPR Beats General Embeddings

DPR outshines general embeddings when:

  1. Specialized vocabulary: Medical, legal, scientific domains
  2. Large document set: >100K passages (FAISS pays off)
  3. Training data available: You can fine-tune on 500+ domain examples
  4. Cost-constrained: FAISS is free; OpenAI embeddings cost ~$2 per 1M tokens

General embeddings (OpenAI, Cohere) are better when:

  • You want simplicity (no training, deployment overhead)
  • You have diverse, cross-domain queries
  • You don't have domain training data

Checklist

  • Start with pre-trained DPR (facebook/dpr-ctx_encoder-single-nq-base)
  • Fine-tune on 500+ domain examples
  • Use hard negative mining during training
  • Index passages with FAISS for scalability
  • Combine DPR (candidate retrieval) + cross-encoder (reranking)
  • Cache popular queries to reduce latency
  • Monitor retrieval quality with domain benchmarks
  • Evaluate DPR vs BM25 vs general embeddings on your data

Conclusion

Dense Passage Retrieval is a powerful tool when you have domain data and scale requirements. Start with fine-tuning a pre-trained model, then add FAISS for billion-scale indexing. For most production RAG systems, DPR (fine-tuned) + cross-encoder reranking achieves 75-85% recall—a significant jump over baseline approaches.