Published on

Query Understanding for RAG — Rewriting, Expansion, and Decomposition

Authors

Introduction

User queries are messy: ambiguous, incomplete, and poorly optimized for vector search. "How much does it cost?" and "What are the pricing options?" ask similar questions but retrieve different results.

Query understanding transforms raw input into optimized search queries, improving downstream retrieval and generation quality.

Query Rewriting

Rephrase queries to be more specific and retrieval-friendly:

async function rewriteQuery(
  originalQuery: string,
  llm: LLMClient
): Promise<string> {
  const rewritePrompt = `
You are a search query optimization expert. Rewrite the user's query to be:
1. More specific and concrete
2. Include domain terminology
3. Optimized for semantic vector search
4. Remove ambiguity

Original query: "${originalQuery}"

Rewritten query (concise, one line):`;

  const response = await llm.generate({
    messages: [{ role: 'user', content: rewritePrompt }],
    maxTokens: 100,
  });

  return response.text.trim();
}

// Multi-attempt rewriting for robustness
async function multiAttemptRewrite(
  originalQuery: string,
  llm: LLMClient,
  attempts: number = 3
): Promise<string[]> {
  const rewrites = await Promise.all(
    Array(attempts)
      .fill(null)
      .map(() => rewriteQuery(originalQuery, llm))
  );

  // Deduplicate similar rewrites
  const uniqueRewrites = new Map<string, string>();

  for (const rewrite of rewrites) {
    const normalized = rewrite.toLowerCase();
    if (!uniqueRewrites.has(normalized)) {
      uniqueRewrites.set(normalized, rewrite);
    }
  }

  return Array.from(uniqueRewrites.values());
}

// Retrieve using all rewrites and combine
async function multiQueryRetrieval(
  originalQuery: string,
  llm: LLMClient,
  retriever: (q: string, k: number) => Promise<Array<{ id: string; text: string; score: number }>>,
  topK: number = 5
): Promise<Array<{ id: string; text: string; score: number; sources: string[] }>> {
  // Generate rewrites
  const rewrites = await multiAttemptRewrite(originalQuery, llm);
  const allQueries = [originalQuery, ...rewrites];

  // Retrieve for each query
  const results = await Promise.all(
    allQueries.map(q => retriever(q, topK * 2))
  );

  // Aggregate results by document ID
  const aggregated = new Map<string, { text: string; score: number; count: number; sources: string[] }>();

  for (let i = 0; i < results.length; i++) {
    for (const result of results[i]) {
      const existing = aggregated.get(result.id) || {
        text: result.text,
        score: 0,
        count: 0,
        sources: [],
      };

      aggregated.set(result.id, {
        ...existing,
        score: existing.score + result.score,
        count: existing.count + 1,
        sources: [...new Set([...existing.sources, allQueries[i]])],
      });
    }
  }

  // Normalize and sort by combined score
  const combined = Array.from(aggregated.entries())
    .map(([id, data]) => ({
      id,
      text: data.text,
      score: data.score / data.count,
      sources: data.sources,
    }))
    .sort((a, b) => b.score - a.score);

  return combined.slice(0, topK);
}

HyDE (Hypothetical Document Embeddings)

Generate a hypothetical answer, then search for documents similar to it:

async function hydeQueryExpansion(
  query: string,
  llm: LLMClient,
  retriever: (q: string, k: number) => Promise<Array<{ id: string; text: string; score: number }>>,
  topK: number = 5
): Promise<Array<{ id: string; text: string; score: number }>> {
  // Step 1: Generate hypothetical answer document
  const hydePrompt = `
Write a detailed paragraph that would answer the following question.
The paragraph should be comprehensive and factually grounded.

Question: "${query}"

Answer paragraph:`;

  const hypothetical = await llm.generate({
    messages: [{ role: 'user', content: hydePrompt }],
    maxTokens: 300,
  });

  // Step 2: Search using both original query and hypothetical document
  const [queryResults, hydeResults] = await Promise.all([
    retriever(query, topK * 2),
    retriever(hypothetical.text, topK * 2),
  ]);

  // Step 3: Fuse results using reciprocal rank fusion
  const rrf = new Map<string, number>();

  queryResults.forEach((r, rank) => {
    rrf.set(r.id, (rrf.get(r.id) || 0) + 1 / (60 + rank + 1));
  });

  hydeResults.forEach((r, rank) => {
    rrf.set(r.id, (rrf.get(r.id) || 0) + 1 / (60 + rank + 1));
  });

  // Step 4: Return combined top-k
  const allResults = [...queryResults, ...hydeResults];
  const uniqueResults = new Map<string, typeof allResults[0]>();

  for (const result of allResults) {
    if (!uniqueResults.has(result.id)) {
      uniqueResults.set(result.id, result);
    }
  }

  return Array.from(uniqueResults.values())
    .map(r => ({ ...r, score: rrf.get(r.id) || r.score }))
    .sort((a, b) => b.score - a.score)
    .slice(0, topK);
}

Step-Back Prompting

Ask an abstraction: move from specific question to broader concept:

async function stepBackPrompting(
  query: string,
  llm: LLMClient,
  retriever: (q: string, k: number) => Promise<Array<{ id: string; text: string; score: number }>>,
  topK: number = 5
): Promise<Array<{ id: string; text: string; score: number }>> {
  // Step 1: Generate step-back question (more abstract)
  const stepBackPrompt = `
Given the user's question, generate a more general/abstract version that
captures the underlying concept. This helps retrieve foundational knowledge.

Original question: "${query}"

Step-back question (more general):`;

  const stepBack = await llm.generate({
    messages: [{ role: 'user', content: stepBackPrompt }],
    maxTokens: 100,
  });

  // Step 2: Retrieve using both specific and general questions
  const [specificResults, generalResults] = await Promise.all([
    retriever(query, topK * 2),
    retriever(stepBack.text, topK * 2),
  ]);

  // Step 3: Combine - prioritize specific over general
  const combined = new Map<string, { result: typeof specificResults[0]; priority: number }>();

  // Specific results get priority 2
  specificResults.forEach(r => {
    combined.set(r.id, { result: r, priority: 2 });
  });

  // General results get priority 1, but don't override specifics
  generalResults.forEach(r => {
    if (!combined.has(r.id)) {
      combined.set(r.id, { result: r, priority: 1 });
    }
  });

  return Array.from(combined.values())
    .sort((a, b) => {
      if (a.priority !== b.priority) return b.priority - a.priority;
      return b.result.score - a.result.score;
    })
    .slice(0, topK)
    .map(x => x.result);
}

Query Decomposition for Multi-Hop Questions

Break complex questions into subquestions:

interface QueryDecomposition {
  isMultiHop: boolean;
  subQuestions: Array<{
    id: string;
    question: string;
    type: 'factual' | 'reasoning' | 'synthesis';
    dependsOn?: string; // ID of previous subquestion
  }>;
}

async function decomposeQuery(
  query: string,
  llm: LLMClient
): Promise<QueryDecomposition> {
  const decomposePrompt = `
Analyze the user's question. If it requires multiple retrieval steps to answer,
decompose it into subquestions. Otherwise, return a single subquestion.

User question: "${query}"

Respond with JSON:
{
  "isMultiHop": boolean,
  "subQuestions": [
    {
      "id": "q1",
      "question": "...",
      "type": "factual|reasoning|synthesis",
      "dependsOn": null or "q1"
    }
  ]
}`;

  const response = await llm.generate({
    messages: [{ role: 'user', content: decomposePrompt }],
    maxTokens: 300,
  });

  return JSON.parse(response.text);
}

async function multiHopRetrieval(
  query: string,
  llm: LLMClient,
  retriever: (q: string, k: number, context?: string) => Promise<Array<{ id: string; text: string; score: number }>>,
  topK: number = 5
): Promise<{
  subQuestionResults: Map<string, Array<{ id: string; text: string; score: number }>>;
  combinedResults: Array<{ id: string; text: string; score: number }>;
}> {
  const decomposition = await decomposeQuery(query, llm);

  const subQuestionResults = new Map<string, Array<{ id: string; text: string; score: number }>>();

  // Process subquestions in dependency order
  const processed = new Set<string>();

  while (processed.size < decomposition.subQuestions.length) {
    for (const subQ of decomposition.subQuestions) {
      // Check if dependencies are satisfied
      if (subQ.dependsOn && !processed.has(subQ.dependsOn)) {
        continue;
      }

      if (processed.has(subQ.id)) {
        continue;
      }

      // Build context from previous subquestion results
      let context = '';
      if (subQ.dependsOn) {
        const depResults = subQuestionResults.get(subQ.dependsOn);
        if (depResults) {
          context = depResults.map(r => r.text).join('\n\n');
        }
      }

      // Retrieve for this subquestion
      const results = await retriever(subQ.question, topK * 2, context);
      subQuestionResults.set(subQ.id, results);
      processed.add(subQ.id);
    }
  }

  // Combine all results, preferring more recent/dependent ones
  const combinedMap = new Map<string, number>();

  let priority = decomposition.subQuestions.length;
  for (const subQ of decomposition.subQuestions) {
    const results = subQuestionResults.get(subQ.id) || [];
    results.forEach((r, idx) => {
      const existingScore = combinedMap.get(r.id) || 0;
      combinedMap.set(r.id, existingScore + r.score * priority);
    });
    priority--;
  }

  const combinedResults = Array.from(combinedMap.entries())
    .map(([id, score]) => ({ id, score, text: '' })) // Text would come from context
    .sort((a, b) => b.score - a.score)
    .slice(0, topK);

  return {
    subQuestionResults,
    combinedResults,
  };
}

Query Intent Classification

Route queries to different strategies based on intent:

type QueryIntent =
  | 'factual_question' // "What is...?"
  | 'procedural' // "How to...?"
  | 'comparison' // "Compare X and Y"
  | 'reasoning' // "Why...?"
  | 'opinion' // "Should...?" (avoid in RAG)
  | 'multi_hop'; // Complex, needs decomposition

async function classifyQueryIntent(
  query: string,
  llm: LLMClient
): Promise<QueryIntent> {
  const classifyPrompt = `
Classify the intent of the user's question. Respond with a single word:
- factual_question: asking for a fact
- procedural: asking for steps/instructions
- comparison: comparing two or more things
- reasoning: asking for explanation (why, how does it work)
- opinion: asking for subjective opinion
- multi_hop: requires multiple retrieval steps

Query: "${query}"

Intent:`;

  const response = await llm.generate({
    messages: [{ role: 'user', content: classifyPrompt }],
    maxTokens: 20,
  });

  const intent = response.text.toLowerCase().trim() as QueryIntent;
  return intent;
}

async function intentAwareRetrieval(
  query: string,
  llm: LLMClient,
  retriever: (q: string, k: number) => Promise<Array<{ id: string; text: string; score: number }>>,
  topK: number = 5
): Promise<Array<{ id: string; text: string; score: number }>> {
  const intent = await classifyQueryIntent(query, llm);

  switch (intent) {
    case 'factual_question':
      // Direct retrieval
      return retriever(query, topK);

    case 'procedural':
      // Rewrite to emphasize "steps", "instructions"
      const procedureQuery = `How to ${query}`;
      return retriever(procedureQuery, topK);

    case 'comparison':
      // Extract entities and retrieve for each
      const comparePrompt = `Extract the items being compared from: "${query}"
Respond with JSON: { "items": ["item1", "item2", ...] }`;

      const compareResponse = await llm.generate({
        messages: [{ role: 'user', content: comparePrompt }],
        maxTokens: 100,
      });

      const { items } = JSON.parse(compareResponse.text);
      const comparisonResults = await Promise.all(
        items.map(item => retriever(item, topK))
      );

      // Merge and deduplicate
      const merged = new Map<string, typeof comparisonResults[0][0]>();
      comparisonResults.forEach(results => {
        results.forEach(r => {
          if (!merged.has(r.id)) merged.set(r.id, r);
        });
      });

      return Array.from(merged.values()).slice(0, topK);

    case 'reasoning':
      // Use step-back
      return stepBackPrompting(query, llm, retriever, topK);

    case 'multi_hop':
      // Decompose and retrieve
      const { combinedResults } = await multiHopRetrieval(query, llm, retriever, topK);
      return combinedResults;

    default:
      return retriever(query, topK);
  }
}

Query Expansion with Synonyms

Add semantic variations to broaden coverage:

async function expandQueryWithSynonyms(
  query: string,
  llm: LLMClient,
  retriever: (q: string, k: number) => Promise<Array<{ id: string; text: string; score: number }>>,
  topK: number = 5
): Promise<Array<{ id: string; text: string; score: number }>> {
  // Step 1: Identify key terms and generate synonyms
  const synonymPrompt = `
Identify 3-5 key terms in the query and provide semantic alternatives.

Query: "${query}"

Respond with JSON:
{
  "keyTerms": {
    "term1": ["synonym1", "synonym2"],
    "term2": ["synonym1", "synonym2"]
  }
}`;

  const synonymResponse = await llm.generate({
    messages: [{ role: 'user', content: synonymPrompt }],
    maxTokens: 200,
  });

  const { keyTerms } = JSON.parse(synonymResponse.text);

  // Step 2: Generate expanded queries
  const expandedQueries = [query];

  for (const term in keyTerms) {
    for (const synonym of keyTerms[term]) {
      expandedQueries.push(query.replace(term, synonym));
    }
  }

  // Step 3: Retrieve and aggregate
  const results = await Promise.all(
    expandedQueries.map(q => retriever(q, topK * 2))
  );

  const aggregated = new Map<string, number>();

  results.forEach(docList => {
    docList.forEach(doc => {
      aggregated.set(doc.id, (aggregated.get(doc.id) || 0) + doc.score);
    });
  });

  return Array.from(aggregated.entries())
    .map(([id, score]) => ({
      id,
      score: score / expandedQueries.length,
      text: '', // Populate from doc store
    }))
    .sort((a, b) => b.score - a.score)
    .slice(0, topK);
}

Query Validation and Routing

Route to appropriate retriever based on query characteristics:

interface QueryCharacteristics {
  length: 'short' | 'medium' | 'long';
  hasMultipleEntities: boolean;
  requiresDateContext: boolean;
  isAmbiguous: boolean;
  estimatedComplexity: 'simple' | 'medium' | 'complex';
}

function analyzeQueryCharacteristics(query: string): QueryCharacteristics {
  const words = query.split(/\s+/);
  const sentenceCount = query.split(/[.!?]/).length;

  return {
    length: words.length < 10 ? 'short' : words.length < 30 ? 'medium' : 'long',
    hasMultipleEntities: /\band\b|\bor\b|,/.test(query),
    requiresDateContext: /\b(today|tomorrow|yesterday|this year|last month)\b/i.test(query),
    isAmbiguous: /\b(it|they|that|this)\b/.test(query) && sentenceCount === 1,
    estimatedComplexity: words.length < 10 ? 'simple' : words.length < 25 ? 'medium' : 'complex',
  };
}

async function routeQueryByCharacteristics(
  query: string,
  retriever: (q: string, strategy: string, k: number) => Promise<Array<{ id: string; text: string; score: number }>>,
  topK: number = 5
): Promise<Array<{ id: string; text: string; score: number }>> {
  const characteristics = analyzeQueryCharacteristics(query);

  if (characteristics.isAmbiguous) {
    // Clarify or expand
    return retriever(query, 'expanded', topK);
  }

  if (characteristics.estimatedComplexity === 'complex') {
    // Decompose or use HyDE
    return retriever(query, 'hyde', topK);
  }

  if (characteristics.hasMultipleEntities) {
    // Multi-query or comparison
    return retriever(query, 'multi_query', topK);
  }

  // Default: direct retrieval
  return retriever(query, 'direct', topK);
}

Checklist

  • Implement query rewriting for ambiguous queries
  • Add HyDE expansion for semantic coverage
  • Implement step-back prompting for reasoning questions
  • Build query decomposition for multi-hop questions
  • Add intent classification routing
  • Implement synonym expansion
  • Track which strategies improve NDCG@5
  • Measure query expansion cost vs retrieval quality gain
  • Benchmark rewrite consistency across runs
  • Monitor query length distribution in production

Conclusion

Query understanding is the underrated foundation of retrieval quality. Before investing in better embedding models or rerankers, optimize your queries. Small improvements in query quality (rewriting, HyDE, decomposition) compound significantly when stacked in your pipeline. The key metric: does improved query understanding move your golden dataset's NDCG@5 up? If yes, you're on the right path.