- Published on
pgvector in Production — Semantic Search and Similarity Queries at Scale
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
pgvector is transforming how teams build semantic search without abandoning PostgreSQL. Whether you're indexing product catalogs, content libraries, or user profiles, understanding index types, distance metrics, and filtering patterns is critical for sub-second queries at scale. This guide walks through production pgvector deployments from embeddings to query optimization.
- Setting Up pgvector: Extension Installation and Configuration
- Distance Metrics: Cosine vs L2 vs Inner Product
- Index Types: IVFFlat vs HNSW
- Embedding Generation Pipeline: OpenAI to Database
- Hybrid Search: Combining Vector and Keyword Matching
- Filtering Before vs After Vector Search
- Index Tuning: m, ef_construction, and lists Parameters
- VACUUM and Index Bloat
- Checklist
- Conclusion
Setting Up pgvector: Extension Installation and Configuration
Start by installing pgvector on your PostgreSQL instance. Most managed services (AWS RDS, Supabase, Neon) offer it pre-enabled, but for self-hosted, compile from source.
-- Install extension (requires postgres-dev on Linux)
CREATE EXTENSION vector;
-- Verify installation
SELECT extversion FROM pg_extension WHERE extname = 'vector';
-- Create a products table with embeddings
CREATE TABLE products (
id BIGSERIAL PRIMARY KEY,
name TEXT NOT NULL,
description TEXT,
embedding vector(1536),
category VARCHAR(100),
price DECIMAL(10, 2),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Enable vector similarity operators
CREATE INDEX ON products USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
-- Hybrid index: category filter + vector similarity
CREATE INDEX ON products (category) WHERE embedding IS NOT NULL;
-- Stats for planner
ANALYZE products;
-- Optional: vector dimension validation constraint
ALTER TABLE products ADD CONSTRAINT check_embedding_dim
CHECK (array_length(embedding, 1) = 1536 OR embedding IS NULL);
Distance Metrics: Cosine vs L2 vs Inner Product
Choosing the right distance metric affects both accuracy and performance. Here's when to use each:
- Cosine similarity (vector_cosine_ops): Best for text embeddings (semantic search). Treats vectors as directions.
- L2 distance (vector_l2_ops): Euclidean distance. Use for dense embeddings or when magnitude matters.
- Inner product (vector_inner_product_ops): Fastest for normalized vectors. Query complexity: O(n*d).
-- Cosine similarity query (0 = identical, 2 = opposite)
SELECT id, name, 1 - (embedding <=> embedding_query) AS similarity
FROM products
WHERE embedding IS NOT NULL
ORDER BY embedding <=> embedding_query
LIMIT 10;
-- L2 distance query (lower = closer)
SELECT id, name, embedding <-> embedding_query AS distance
FROM products
WHERE embedding IS NOT NULL
ORDER BY embedding <-> embedding_query
LIMIT 10;
-- Inner product (best for normalized embeddings from OpenAI)
SELECT id, name, embedding <#> embedding_query AS neg_similarity
FROM products
WHERE embedding IS NOT NULL
ORDER BY embedding <#> embedding_query
LIMIT 10;
Index Types: IVFFlat vs HNSW
pgvector supports two approximate nearest neighbor (ANN) indexes. Understanding the tradeoffs is crucial for production.
IVFFlat (Inverted File Flat):
- Divides vectors into k clusters (lists).
- Fast build time, lower memory.
- Accuracy depends on
probesparameter at query time. - Good for write-heavy workloads.
HNSW (Hierarchical Navigable Small World):
- Builds multi-layer graph structure.
- Slower build, higher memory cost.
- Consistently better quality, especially with high dimensions.
- Best for stable, read-heavy datasets.
-- IVFFlat index: optimal for frequent inserts
CREATE INDEX embedding_ivfflat_idx ON products
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- HNSW index: optimal for high query volume
CREATE INDEX embedding_hnsw_idx ON products
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Query with HNSW respects ef parameter
SET hnsw.ef_search = 100;
-- Monitor index size
SELECT pg_size_pretty(pg_total_relation_size('embedding_hnsw_idx'));
-- Benchmark: IVFFlat with different probe counts
SET ivfflat.probes = 10;
EXPLAIN ANALYZE SELECT id FROM products
ORDER BY embedding <=> $1 LIMIT 20;
Embedding Generation Pipeline: OpenAI to Database
Building a robust pipeline ensures consistency across your product.
// production embedding generation with batching and retry
import OpenAI from 'openai';
import { sql } from 'drizzle-orm';
import { db } from './db';
import { products } from './schema';
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
maxRetries: 3,
timeout: 30000,
});
async function generateEmbeddings(texts: string[]): Promise<number[][]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: texts,
dimensions: 1536,
});
return response.data
.sort((a, b) => a.index - b.index)
.map(d => d.embedding);
}
async function updateProductEmbeddings(productIds: string[]) {
const batch = await db
.select({ id: products.id, name: products.name, description: products.description })
.from(products)
.where(sql`${products.id} = ANY(${productIds})`)
.limit(20);
const texts = batch.map(p => `${p.name} ${p.description || ''}`);
const embeddings = await generateEmbeddings(texts);
// Batch insert embeddings
for (let i = 0; i < batch.length; i++) {
await db
.update(products)
.set({
embedding: sql`${JSON.stringify(embeddings[i])}::vector`,
updated_at: new Date(),
})
.where(sql`${products.id} = ${batch[i].id}`);
}
}
// Batch process: handle missing embeddings
async function processMissingEmbeddings() {
const missing = await db
.selectDistinct({ id: products.id })
.from(products)
.where(sql`${products.embedding} IS NULL`)
.limit(100);
if (missing.length === 0) return;
await updateProductEmbeddings(missing.map(p => p.id));
}
Hybrid Search: Combining Vector and Keyword Matching
Real-world search often benefits from both semantic and keyword matching. Use PostgreSQL full-text search alongside vectors.
-- Add full-text search index
ALTER TABLE products ADD COLUMN search_ts tsvector
GENERATED ALWAYS AS (to_tsvector('english', name || ' ' || COALESCE(description, ''))) STORED;
CREATE INDEX search_ts_idx ON products USING gin(search_ts);
-- Hybrid query: keyword relevance + semantic similarity
WITH keyword_matches AS (
SELECT
id,
ts_rank(search_ts, query) AS keyword_score
FROM products,
plainto_tsquery('english', 'lightweight backpack') AS query
WHERE search_ts @@ query
),
vector_matches AS (
SELECT
id,
1 - (embedding <=> $1) AS vector_score
FROM products
WHERE embedding IS NOT NULL
ORDER BY embedding <=> $1
LIMIT 100
)
SELECT
COALESCE(k.id, v.id) AS id,
p.name,
COALESCE(k.keyword_score, 0) AS keyword_score,
COALESCE(v.vector_score, 0) AS vector_score,
(COALESCE(k.keyword_score, 0) * 0.3 + COALESCE(v.vector_score, 0) * 0.7) AS combined_score
FROM products p
LEFT JOIN keyword_matches k ON p.id = k.id
LEFT JOIN vector_matches v ON p.id = v.id
WHERE k.id IS NOT NULL OR v.id IS NOT NULL
ORDER BY combined_score DESC
LIMIT 20;
Filtering Before vs After Vector Search
Query planning differs based on filter timing. For selective filters, push down before vector search.
-- BEFORE: Filter first (good when filter eliminates 90%+)
EXPLAIN ANALYZE
SELECT id, name, embedding <=> $1 AS distance
FROM products
WHERE category = 'backpacks' AND price < 100
ORDER BY embedding <=> $1
LIMIT 10;
-- AFTER: Vector search first, filter results (good for rare filters)
EXPLAIN ANALYZE
SELECT id, name, distance
FROM (
SELECT id, name, embedding <=> $1 AS distance
FROM products
WHERE embedding IS NOT NULL
ORDER BY embedding <=> $1
LIMIT 100
) t
WHERE category = 'backpacks' AND price < 100
LIMIT 10;
-- Adaptive: Let planner decide with prepared statements
PREPARE hybrid_search (vector, text, decimal) AS
SELECT id, name, embedding <=> $1 AS distance
FROM products
WHERE category = $2 AND price < $3
ORDER BY embedding <=> $1
LIMIT 20;
EXECUTE hybrid_search('vector_value', 'backpacks', 100);
Index Tuning: m, ef_construction, and lists Parameters
Fine-tuning index parameters directly impacts query latency and build time.
-- HNSW parameter guidance
-- m: default 16 (higher = better quality, slower build)
-- ef_construction: default 64 (higher = better quality)
-- High-precision scenario (financial/medical)
CREATE INDEX embedding_precise ON products
USING hnsw (embedding vector_cosine_ops)
WITH (m = 24, ef_construction = 128);
-- Balanced scenario (e-commerce, typical case)
CREATE INDEX embedding_balanced ON products
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Low-latency scenario (real-time inference)
CREATE INDEX embedding_fast ON products
USING hnsw (embedding vector_cosine_ops)
WITH (m = 8, ef_construction = 32);
-- Monitor index build progress
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE indexname LIKE 'embedding_%'
ORDER BY idx_scan DESC;
VACUUM and Index Bloat
Vector indexes need maintenance. Monitor bloat and reindex periodically.
-- Full VACUUM (locks table, but cleans bloat)
VACUUM FULL ANALYZE products;
-- Concurrent VACUUM (no lock, slower)
VACUUM ANALYZE products;
-- Reindex strategy: swap with new index (zero-downtime)
-- 1. Create new index concurrently
CREATE INDEX CONCURRENTLY embedding_hnsw_new ON products
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- 2. Swap names (atomic rename)
BEGIN;
ALTER INDEX embedding_hnsw RENAME TO embedding_hnsw_old;
ALTER INDEX embedding_hnsw_new RENAME TO embedding_hnsw;
COMMIT;
-- 3. Drop old index
DROP INDEX embedding_hnsw_old;
-- Monitor index memory usage
SELECT
indexname,
pg_size_pretty(pg_relation_size(indexrelname)) AS size
FROM pg_stat_user_indexes
WHERE tablename = 'products'
ORDER BY pg_relation_size(indexrelname) DESC;
Checklist
- pgvector extension installed and verified
- Appropriate index type chosen (IVFFlat vs HNSW)
- Distance metric selected based on embedding source
- Embedding generation pipeline tested with batching
- Hybrid search query validated with explain plan
- Filter strategy optimized (before/after analysis)
- Index parameters tuned for your workload
- VACUUM and maintenance scheduled
- Reindex strategy documented (zero-downtime approach)
- Query latency SLA monitored in production
Conclusion
pgvector brings vector search capabilities to PostgreSQL without operational complexity. Start with HNSW indexes for read-heavy workloads, implement hybrid search for relevance, and invest time in tuning index parameters for your specific embedding model and query patterns. Monitor index bloat regularly and plan zero-downtime reindexing strategies. With proper setup, you'll achieve sub-millisecond similarity queries at production scale.