- Published on
Vector Search With Filtering — Combining Semantic Search and Metadata Filters at Scale
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
Vector search alone is incomplete. Combining semantic similarity with metadata filtering enables powerful search that respects business logic, tenant isolation, and data freshness constraints. This guide covers production filtering patterns at scale.
- Pre-Filter vs Post-Filter Tradeoffs
- HNSW With Payload Filtering
- pgVector With WHERE Clause Filtering
- Hybrid Scoring With Reciprocal Rank Fusion
- Namespace Isolation for Multi-Tenancy
- Approximate vs Exact Search Decision
- Re-Ranking With Cross-Encoders
- Checklist
- Conclusion
Pre-Filter vs Post-Filter Tradeoffs
Choose filtering strategy based on selectivity and performance requirements.
interface Document {
id: string;
content: string;
embedding: number[];
tags: string[];
status: 'published' | 'draft' | 'archived';
createdAt: Date;
userId: string;
tenantId: string;
}
class FilteringStrategy {
// Pre-filter: Filter before vector search (high selectivity, lower recall)
async preFilterSearch(
documents: Document[],
query: string,
queryEmbedding: number[],
filters: { status?: string; userId?: string; tags?: string[] },
limit: number = 10
): Promise<Document[]> {
// Filter first
let filtered = documents.filter((doc) => {
if (filters.status && doc.status !== filters.status) return false;
if (filters.userId && doc.userId !== filters.userId) return false;
if (filters.tags && !filters.tags.every((tag) => doc.tags.includes(tag))) return false;
return true;
});
// Then search in filtered set
const scored = filtered.map((doc) => ({
doc,
score: this.cosineSimilarity(queryEmbedding, doc.embedding),
}));
return scored.sort((a, b) => b.score - a.score).slice(0, limit).map((item) => item.doc);
}
// Post-filter: Search all, filter results (lower selectivity, higher recall)
async postFilterSearch(
documents: Document[],
query: string,
queryEmbedding: number[],
filters: { status?: string; userId?: string; tags?: string[] },
limit: number = 10
): Promise<Document[]> {
// Search all
const scored = documents.map((doc) => ({
doc,
score: this.cosineSimilarity(queryEmbedding, doc.embedding),
}));
// Sort by score first
const topResults = scored.sort((a, b) => b.score - a.score).slice(0, limit * 3);
// Filter results
const filtered = topResults.filter(({ doc }) => {
if (filters.status && doc.status !== filters.status) return false;
if (filters.userId && doc.userId !== filters.userId) return false;
if (filters.tags && !filters.tags.every((tag) => doc.tags.includes(tag))) return false;
return true;
});
return filtered.slice(0, limit).map((item) => item.doc);
}
private cosineSimilarity(vecA: number[], vecB: number[]): number {
const dotProduct = vecA.reduce((sum, a, i) => sum + a * vecB[i], 0);
const magnitudeA = Math.sqrt(vecA.reduce((sum, a) => sum + a * a, 0));
const magnitudeB = Math.sqrt(vecB.reduce((sum, b) => sum + b * b, 0));
return dotProduct / (magnitudeA * magnitudeB);
}
}
const strategy = new FilteringStrategy();
const docs: Document[] = [
{
id: '1',
content: 'Document content',
embedding: [0.1, 0.2, 0.3],
tags: ['tag1'],
status: 'published',
createdAt: new Date(),
userId: 'user1',
tenantId: 'tenant1',
},
];
const results = await strategy.preFilterSearch(docs, 'query', [0.1, 0.2, 0.3], {
status: 'published',
userId: 'user1',
});
HNSW With Payload Filtering
Use Qdrant or Weaviate's native filtering to combine vector search with metadata predicates.
interface QdrantPoint {
id: number;
vector: number[];
payload: Record<string, unknown>;
}
class HNSWFilteredIndex {
private points: Map<number, QdrantPoint> = new Map();
private nextId = 1;
addPoint(vector: number[], payload: Record<string, unknown>): number {
const id = this.nextId++;
this.points.set(id, { id, vector, payload });
return id;
}
search(
query: number[],
limit: number = 10,
filters?: {
must?: Array<{ key: string; equals?: unknown; range?: { gte?: number; lte?: number } }>;
should?: Array<{ key: string; equals?: unknown }>;
}
): QdrantPoint[] {
const scored: Array<{ point: QdrantPoint; score: number }> = [];
for (const point of this.points.values()) {
// Apply must filters
if (filters?.must) {
const allMatch = filters.must.every((filter) => {
const value = this.getNestedValue(point.payload, filter.key);
if (filter.equals !== undefined) {
return value === filter.equals;
}
if (filter.range) {
const numValue = value as number;
if (filter.range.gte !== undefined && numValue < filter.range.gte) return false;
if (filter.range.lte !== undefined && numValue > filter.range.lte) return false;
}
return true;
});
if (!allMatch) continue;
}
// Calculate score
const score = this.cosineSimilarity(query, point.vector);
// Apply should filters (boost score)
let finalScore = score;
if (filters?.should) {
for (const filter of filters.should) {
const value = this.getNestedValue(point.payload, filter.key);
if (value === filter.equals) {
finalScore *= 1.1; // Boost score
}
}
}
scored.push({ point, score: finalScore });
}
return scored.sort((a, b) => b.score - a.score).slice(0, limit).map((item) => item.point);
}
private getNestedValue(obj: Record<string, unknown>, path: string): unknown {
return path.split('.').reduce((current: any, key: string) => current?.[key], obj);
}
private cosineSimilarity(vecA: number[], vecB: number[]): number {
const dotProduct = vecA.reduce((sum, a, i) => sum + a * vecB[i], 0);
const magnitudeA = Math.sqrt(vecA.reduce((sum, a) => sum + a * a, 0));
const magnitudeB = Math.sqrt(vecB.reduce((sum, b) => sum + b * b, 0));
return dotProduct / (magnitudeA * magnitudeB);
}
}
const index = new HNSWFilteredIndex();
index.addPoint([0.1, 0.2], { status: 'published', category: 'tech' });
index.addPoint([0.15, 0.25], { status: 'draft', category: 'tech' });
const results = index.search([0.12, 0.22], 5, {
must: [{ key: 'status', equals: 'published' }],
should: [{ key: 'category', equals: 'tech' }],
});
console.log('Search results:', results.length);
pgVector With WHERE Clause Filtering
Use PostgreSQL's pgvector for vector search with powerful SQL filtering.
class PGVectorSearch {
async searchWithFilters(
client: any, // PostgreSQL client
query: number[],
tableName: string,
filters?: {
where?: string; // SQL WHERE clause
limit?: number;
offset?: number;
}
): Promise<Array<{ id: string; content: string; similarity: number }>> {
const limit = filters?.limit || 10;
const offset = filters?.offset || 0;
let whereClause = '';
if (filters?.where) {
whereClause = `WHERE ${filters.where}`;
}
const query_sql = `
SELECT id, content, (embedding <=> $1) AS similarity
FROM ${tableName}
${whereClause}
ORDER BY embedding <=> $1
LIMIT $${filters?.where ? '2' : '1'} OFFSET $${filters?.where ? '3' : '2'}
`;
const params = [JSON.stringify(query)];
if (filters?.where) {
// WHERE clause is already in the SQL
}
params.push(limit);
params.push(offset);
const result = await client.query(query_sql, params);
return result.rows.map((row: any) => ({
id: row.id,
content: row.content,
similarity: 1 - row.similarity, // Convert distance to similarity
}));
}
buildWhereClause(filters: {
status?: string;
category?: string[];
createdAfter?: Date;
createdBefore?: Date;
userId?: string;
}): string {
const conditions: string[] = [];
if (filters.status) {
conditions.push(`status = '${filters.status}'`);
}
if (filters.category && filters.category.length > 0) {
const cats = filters.category.map((c) => `'${c}'`).join(', ');
conditions.push(`category IN (${cats})`);
}
if (filters.createdAfter) {
conditions.push(`created_at >= '${filters.createdAfter.toISOString()}'`);
}
if (filters.createdBefore) {
conditions.push(`created_at <= '${filters.createdBefore.toISOString()}'`);
}
if (filters.userId) {
conditions.push(`user_id = '${filters.userId}'`);
}
return conditions.length > 0 ? conditions.join(' AND ') : '';
}
}
const pgSearch = new PGVectorSearch();
const whereClause = pgSearch.buildWhereClause({
status: 'published',
category: ['tech', 'ai'],
userId: 'user123',
});
// const results = await pgSearch.searchWithFilters(client, queryVector, 'documents', {
// where: whereClause,
// limit: 10,
// });
Hybrid Scoring With Reciprocal Rank Fusion
Combine vector and keyword search using RRF for better results.
class HybridSearchEngine {
calculateRRF(vectorRank: number, keywordRank: number, k: number = 60): number {
// Reciprocal Rank Fusion formula: 1/(k + rank)
const vectorScore = 1 / (k + vectorRank);
const keywordScore = 1 / (k + keywordRank);
return vectorScore + keywordScore;
}
async hybridSearch(
query: string,
vectorSearchFn: () => Promise<Array<{ id: string; score: number }>>,
keywordSearchFn: () => Promise<Array<{ id: string; score: number }>>,
limit: number = 10
): Promise<Array<{ id: string; hybridScore: number; vectorRank?: number; keywordRank?: number }>> {
const [vectorResults, keywordResults] = await Promise.all([vectorSearchFn(), keywordSearchFn()]);
// Create rank maps
const vectorRanks = new Map<string, number>();
vectorResults.forEach((result, index) => {
vectorRanks.set(result.id, index);
});
const keywordRanks = new Map<string, number>();
keywordResults.forEach((result, index) => {
keywordRanks.set(result.id, index);
});
// Combine results
const allIds = new Set<string>([
...vectorResults.map((r) => r.id),
...keywordResults.map((r) => r.id),
]);
const scored: Array<{ id: string; hybridScore: number; vectorRank?: number; keywordRank?: number }> = [];
for (const id of allIds) {
const vectorRank = vectorRanks.get(id) ?? vectorResults.length + 1;
const keywordRank = keywordRanks.get(id) ?? keywordResults.length + 1;
const hybridScore = this.calculateRRF(vectorRank, keywordRank);
scored.push({ id, hybridScore, vectorRank, keywordRank });
}
return scored.sort((a, b) => b.hybridScore - a.hybridScore).slice(0, limit);
}
}
const hybrid = new HybridSearchEngine();
const results = await hybrid.hybridSearch(
'search query',
async () => [
{ id: 'doc1', score: 0.9 },
{ id: 'doc2', score: 0.8 },
],
async () => [
{ id: 'doc2', score: 0.95 },
{ id: 'doc3', score: 0.7 },
],
10
);
console.log('Hybrid results:', results);
Namespace Isolation for Multi-Tenancy
Implement tenant-aware vector search for SaaS platforms.
class MultiTenantVectorSearch {
private tenantIndices = new Map<string, { vectors: Map<string, number[]>; metadata: Map<string, Record<string, unknown>> }>();
initializeTenant(tenantId: string): void {
if (!this.tenantIndices.has(tenantId)) {
this.tenantIndices.set(tenantId, {
vectors: new Map(),
metadata: new Map(),
});
}
}
addDocument(tenantId: string, docId: string, vector: number[], metadata: Record<string, unknown>): void {
this.initializeTenant(tenantId);
const index = this.tenantIndices.get(tenantId)!;
index.vectors.set(docId, vector);
index.metadata.set(docId, metadata);
}
search(
tenantId: string,
query: number[],
filters?: Record<string, unknown>,
limit: number = 10
): Array<{ docId: string; similarity: number; metadata: Record<string, unknown> }> {
const index = this.tenantIndices.get(tenantId);
if (!index) {
return [];
}
const results: Array<{ docId: string; similarity: number; metadata: Record<string, unknown> }> = [];
for (const [docId, vector] of index.vectors.entries()) {
const metadata = index.metadata.get(docId)!;
// Apply filters
if (filters) {
const matches = Object.entries(filters).every(([key, value]) => metadata[key] === value);
if (!matches) continue;
}
const similarity = this.cosineSimilarity(query, vector);
results.push({ docId, similarity, metadata });
}
return results.sort((a, b) => b.similarity - a.similarity).slice(0, limit);
}
deleteTenant(tenantId: string): void {
this.tenantIndices.delete(tenantId);
}
private cosineSimilarity(vecA: number[], vecB: number[]): number {
const dotProduct = vecA.reduce((sum, a, i) => sum + a * vecB[i], 0);
const magnitudeA = Math.sqrt(vecA.reduce((sum, a) => sum + a * a, 0));
const magnitudeB = Math.sqrt(vecB.reduce((sum, b) => sum + b * b, 0));
return dotProduct / (magnitudeA * magnitudeB);
}
}
const mtSearch = new MultiTenantVectorSearch();
mtSearch.addDocument('tenant1', 'doc1', [0.1, 0.2, 0.3], { status: 'published' });
mtSearch.addDocument('tenant1', 'doc2', [0.15, 0.25, 0.35], { status: 'draft' });
mtSearch.addDocument('tenant2', 'doc3', [0.2, 0.3, 0.4], { status: 'published' });
const tenant1Results = mtSearch.search('tenant1', [0.12, 0.22, 0.32], { status: 'published' });
const tenant2Results = mtSearch.search('tenant2', [0.2, 0.3, 0.4]);
console.log('Tenant 1:', tenant1Results.length);
console.log('Tenant 2:', tenant2Results.length);
Approximate vs Exact Search Decision
Choose between fast approximate and accurate exact search based on requirements.
class AdaptiveVectorSearch {
private approximateThreshold = 100000; // Switch at 100K documents
async search(
documents: Array<{ id: string; vector: number[]; metadata: Record<string, unknown> }>,
query: number[],
options?: { useApproximate?: boolean; limit?: number }
): Promise<Array<{ id: string; similarity: number }>> {
const useApproximate =
options?.useApproximate !== false && documents.length > this.approximateThreshold;
if (useApproximate) {
return this.approximateSearch(documents, query, options?.limit || 10);
} else {
return this.exactSearch(documents, query, options?.limit || 10);
}
}
private exactSearch(
documents: Array<{ id: string; vector: number[] }>,
query: number[],
limit: number
): Array<{ id: string; similarity: number }> {
const scored = documents.map((doc) => ({
id: doc.id,
similarity: this.cosineSimilarity(query, doc.vector),
}));
return scored.sort((a, b) => b.similarity - a.similarity).slice(0, limit);
}
private approximateSearch(
documents: Array<{ id: string; vector: number[] }>,
query: number[],
limit: number
): Array<{ id: string; similarity: number }> {
// LSH-based approximate search (simplified)
const queryHash = this.hashVector(query);
const hashed = documents.map((doc) => ({
id: doc.id,
hash: this.hashVector(doc.vector),
vector: doc.vector,
}));
// Find close hashes
const candidates = hashed.filter((doc) => this.hammingDistance(queryHash, doc.hash) < 10);
const scored = candidates.map((doc) => ({
id: doc.id,
similarity: this.cosineSimilarity(query, doc.vector),
}));
return scored.sort((a, b) => b.similarity - a.similarity).slice(0, limit);
}
private hashVector(vector: number[]): number[] {
// Simple hash for demo
return vector.map((v) => (v > 0 ? 1 : 0));
}
private hammingDistance(hash1: number[], hash2: number[]): number {
return hash1.reduce((sum, bit, i) => sum + (bit === hash2[i] ? 0 : 1), 0);
}
private cosineSimilarity(vecA: number[], vecB: number[]): number {
const dotProduct = vecA.reduce((sum, a, i) => sum + a * vecB[i], 0);
const magnitudeA = Math.sqrt(vecA.reduce((sum, a) => sum + a * a, 0));
const magnitudeB = Math.sqrt(vecB.reduce((sum, b) => sum + b * b, 0));
return dotProduct / (magnitudeA * magnitudeB);
}
}
const adaptive = new AdaptiveVectorSearch();
const docs = Array(1000)
.fill(0)
.map((_, i) => ({ id: `doc${i}`, vector: Array(768).fill(Math.random()), metadata: {} }));
const results = await adaptive.search(docs, Array(768).fill(0.5), { limit: 10 });
Re-Ranking With Cross-Encoders
Improve precision with neural re-ranking on top candidates.
class CrossEncoderReranker {
async rerank(
query: string,
candidates: Array<{ id: string; text: string; score: number }>,
topK: number = 5
): Promise<Array<{ id: string; text: string; score: number; rerankedScore: number }>> {
const reranked = candidates.map((candidate) => ({
...candidate,
rerankedScore: this.scoreRelevance(query, candidate.text),
}));
return reranked.sort((a, b) => b.rerankedScore - a.rerankedScore).slice(0, topK);
}
private scoreRelevance(query: string, text: string): number {
// Simplified relevance scoring (in production, use a real cross-encoder model)
const queryTokens = query.toLowerCase().split(/\s+/);
const textTokens = text.toLowerCase().split(/\s+/);
let matchScore = 0;
for (const token of queryTokens) {
if (textTokens.includes(token)) {
matchScore += 1;
}
}
// Normalize by query length
return matchScore / Math.max(queryTokens.length, 1);
}
normalizeScores(
candidates: Array<{ id: string; text: string; score: number }>,
rerankedScores: Array<{ id: string; rerankedScore: number }>,
vectorWeight: number = 0.3,
rerankWeight: number = 0.7
): Array<{ id: string; text: string; combinedScore: number }> {
const rerankMap = new Map(rerankedScores.map((s) => [s.id, s.rerankedScore]));
return candidates.map((candidate) => {
const normalizedVector = candidate.score; // Already 0-1
const normalizedRerank = (rerankMap.get(candidate.id) || 0) / 1.0; // Max score is 1
const combinedScore = vectorWeight * normalizedVector + rerankWeight * normalizedRerank;
return {
id: candidate.id,
text: candidate.text,
combinedScore,
};
});
}
}
const reranker = new CrossEncoderReranker();
const candidates = [
{ id: 'doc1', text: 'Machine learning is powerful', score: 0.85 },
{ id: 'doc2', text: 'Learning theory in ML', score: 0.82 },
];
const reranked = await reranker.rerank('machine learning', candidates, 5);
console.log('Reranked:', reranked);
Checklist
- Choose pre-filter vs post-filter based on filter selectivity
- Use HNSW with native payload filtering in Qdrant/Weaviate
- Leverage pgvector's WHERE clause for SQL-based filtering
- Combine vector and keyword search with Reciprocal Rank Fusion
- Implement namespace isolation for multi-tenant systems
- Use approximate search for large datasets (100K+), exact for smaller
- Add cross-encoder re-ranking to top candidates for precision
- Monitor filter effectiveness (selectivity and recall impact)
- Cache frequently used filtered searches
- Test filter combinations against golden datasets
- Measure latency impact of pre vs post filtering
- Index high-cardinality metadata for filter performance
Conclusion
Effective vector search requires combining semantic similarity with intelligent filtering. Start with simple pre-filtering for high-selectivity filters, add namespace isolation for multi-tenancy, and use hybrid scoring to combine vector and keyword search. For precision-critical applications, add cross-encoder re-ranking. Always measure filter selectivity and query latency in production to guide optimization decisions.