pgvector in Production — Semantic Search and Similarity Queries at Scale

Introduction

pgvector is transforming how teams build semantic search without abandoning PostgreSQL. Whether you're indexing product catalogs, content libraries, or user profiles, understanding index types, distance metrics, and filtering patterns is critical for sub-second queries at scale. This guide walks through production pgvector deployments from embeddings to query optimization.

Setting Up pgvector: Extension Installation and Configuration
Distance Metrics: Cosine vs L2 vs Inner Product
Index Types: IVFFlat vs HNSW
Embedding Generation Pipeline: OpenAI to Database
Hybrid Search: Combining Vector and Keyword Matching
Filtering Before vs After Vector Search
Index Tuning: m, ef_construction, and lists Parameters
VACUUM and Index Bloat
Checklist
Conclusion

Setting Up pgvector: Extension Installation and Configuration

Start by installing pgvector on your PostgreSQL instance. Most managed services (AWS RDS, Supabase, Neon) offer it pre-enabled, but for self-hosted, compile from source.

-- Install extension (requires postgres-dev on Linux)
CREATE EXTENSION vector;

-- Verify installation
SELECT extversion FROM pg_extension WHERE extname = 'vector';

-- Create a products table with embeddings
CREATE TABLE products (
  id BIGSERIAL PRIMARY KEY,
  name TEXT NOT NULL,
  description TEXT,
  embedding vector(1536),
  category VARCHAR(100),
  price DECIMAL(10, 2),
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Enable vector similarity operators
CREATE INDEX ON products USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

-- Hybrid index: category filter + vector similarity
CREATE INDEX ON products (category) WHERE embedding IS NOT NULL;

-- Stats for planner
ANALYZE products;

-- Optional: vector dimension validation constraint
ALTER TABLE products ADD CONSTRAINT check_embedding_dim
  CHECK (array_length(embedding, 1) = 1536 OR embedding IS NULL);

Distance Metrics: Cosine vs L2 vs Inner Product

Choosing the right distance metric affects both accuracy and performance. Here's when to use each:

Cosine similarity (vector_cosine_ops): Best for text embeddings (semantic search). Treats vectors as directions.
L2 distance (vector_l2_ops): Euclidean distance. Use for dense embeddings or when magnitude matters.
Inner product (vector_inner_product_ops): Fastest for normalized vectors. Query complexity: O(n*d).

-- Cosine similarity query (0 = identical, 2 = opposite)
SELECT id, name, 1 - (embedding <=> embedding_query) AS similarity
FROM products
WHERE embedding IS NOT NULL
ORDER BY embedding <=> embedding_query
LIMIT 10;

-- L2 distance query (lower = closer)
SELECT id, name, embedding <-> embedding_query AS distance
FROM products
WHERE embedding IS NOT NULL
ORDER BY embedding <-> embedding_query
LIMIT 10;

-- Inner product (best for normalized embeddings from OpenAI)
SELECT id, name, embedding <#> embedding_query AS neg_similarity
FROM products
WHERE embedding IS NOT NULL
ORDER BY embedding <#> embedding_query
LIMIT 10;

Index Types: IVFFlat vs HNSW

pgvector supports two approximate nearest neighbor (ANN) indexes. Understanding the tradeoffs is crucial for production.

IVFFlat (Inverted File Flat):

Divides vectors into k clusters (lists).
Fast build time, lower memory.
Accuracy depends on probes parameter at query time.
Good for write-heavy workloads.

HNSW (Hierarchical Navigable Small World):

Builds multi-layer graph structure.
Slower build, higher memory cost.
Consistently better quality, especially with high dimensions.
Best for stable, read-heavy datasets.

-- IVFFlat index: optimal for frequent inserts
CREATE INDEX embedding_ivfflat_idx ON products
  USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);

-- HNSW index: optimal for high query volume
CREATE INDEX embedding_hnsw_idx ON products
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

-- Query with HNSW respects ef parameter
SET hnsw.ef_search = 100;

-- Monitor index size
SELECT pg_size_pretty(pg_total_relation_size('embedding_hnsw_idx'));

-- Benchmark: IVFFlat with different probe counts
SET ivfflat.probes = 10;
EXPLAIN ANALYZE SELECT id FROM products
  ORDER BY embedding <=> $1 LIMIT 20;

Embedding Generation Pipeline: OpenAI to Database

Building a robust pipeline ensures consistency across your product.

// production embedding generation with batching and retry
import OpenAI from 'openai';
import { sql } from 'drizzle-orm';
import { db } from './db';
import { products } from './schema';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  maxRetries: 3,
  timeout: 30000,
});

async function generateEmbeddings(texts: string[]): Promise<number[][]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: texts,
    dimensions: 1536,
  });

  return response.data
    .sort((a, b) => a.index - b.index)
    .map(d => d.embedding);
}

async function updateProductEmbeddings(productIds: string[]) {
  const batch = await db
    .select({ id: products.id, name: products.name, description: products.description })
    .from(products)
    .where(sql`${products.id} = ANY(${productIds})`)
    .limit(20);

  const texts = batch.map(p => `${p.name} ${p.description || ''}`);
  const embeddings = await generateEmbeddings(texts);

  // Batch insert embeddings
  for (let i = 0; i < batch.length; i++) {
    await db
      .update(products)
      .set({
        embedding: sql`${JSON.stringify(embeddings[i])}::vector`,
        updated_at: new Date(),
      })
      .where(sql`${products.id} = ${batch[i].id}`);
  }
}

// Batch process: handle missing embeddings
async function processMissingEmbeddings() {
  const missing = await db
    .selectDistinct({ id: products.id })
    .from(products)
    .where(sql`${products.embedding} IS NULL`)
    .limit(100);

  if (missing.length === 0) return;

  await updateProductEmbeddings(missing.map(p => p.id));
}

Hybrid Search: Combining Vector and Keyword Matching

Real-world search often benefits from both semantic and keyword matching. Use PostgreSQL full-text search alongside vectors.

-- Add full-text search index
ALTER TABLE products ADD COLUMN search_ts tsvector
  GENERATED ALWAYS AS (to_tsvector('english', name || ' ' || COALESCE(description, ''))) STORED;

CREATE INDEX search_ts_idx ON products USING gin(search_ts);

-- Hybrid query: keyword relevance + semantic similarity
WITH keyword_matches AS (
  SELECT
    id,
    ts_rank(search_ts, query) AS keyword_score
  FROM products,
       plainto_tsquery('english', 'lightweight backpack') AS query
  WHERE search_ts @@ query
),
vector_matches AS (
  SELECT
    id,
    1 - (embedding <=> $1) AS vector_score
  FROM products
  WHERE embedding IS NOT NULL
  ORDER BY embedding <=> $1
  LIMIT 100
)
SELECT
  COALESCE(k.id, v.id) AS id,
  p.name,
  COALESCE(k.keyword_score, 0) AS keyword_score,
  COALESCE(v.vector_score, 0) AS vector_score,
  (COALESCE(k.keyword_score, 0) * 0.3 + COALESCE(v.vector_score, 0) * 0.7) AS combined_score
FROM products p
LEFT JOIN keyword_matches k ON p.id = k.id
LEFT JOIN vector_matches v ON p.id = v.id
WHERE k.id IS NOT NULL OR v.id IS NOT NULL
ORDER BY combined_score DESC
LIMIT 20;

Filtering Before vs After Vector Search

Query planning differs based on filter timing. For selective filters, push down before vector search.

-- BEFORE: Filter first (good when filter eliminates 90%+)
EXPLAIN ANALYZE
SELECT id, name, embedding <=> $1 AS distance
FROM products
WHERE category = 'backpacks' AND price < 100
ORDER BY embedding <=> $1
LIMIT 10;

-- AFTER: Vector search first, filter results (good for rare filters)
EXPLAIN ANALYZE
SELECT id, name, distance
FROM (
  SELECT id, name, embedding <=> $1 AS distance
  FROM products
  WHERE embedding IS NOT NULL
  ORDER BY embedding <=> $1
  LIMIT 100
) t
WHERE category = 'backpacks' AND price < 100
LIMIT 10;

-- Adaptive: Let planner decide with prepared statements
PREPARE hybrid_search (vector, text, decimal) AS
SELECT id, name, embedding <=> $1 AS distance
FROM products
WHERE category = $2 AND price < $3
ORDER BY embedding <=> $1
LIMIT 20;

EXECUTE hybrid_search('vector_value', 'backpacks', 100);

Index Tuning: m, ef_construction, and lists Parameters

Fine-tuning index parameters directly impacts query latency and build time.

-- HNSW parameter guidance
-- m: default 16 (higher = better quality, slower build)
-- ef_construction: default 64 (higher = better quality)

-- High-precision scenario (financial/medical)
CREATE INDEX embedding_precise ON products
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 24, ef_construction = 128);

-- Balanced scenario (e-commerce, typical case)
CREATE INDEX embedding_balanced ON products
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

-- Low-latency scenario (real-time inference)
CREATE INDEX embedding_fast ON products
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 8, ef_construction = 32);

-- Monitor index build progress
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE indexname LIKE 'embedding_%'
ORDER BY idx_scan DESC;

VACUUM and Index Bloat

Vector indexes need maintenance. Monitor bloat and reindex periodically.

-- Full VACUUM (locks table, but cleans bloat)
VACUUM FULL ANALYZE products;

-- Concurrent VACUUM (no lock, slower)
VACUUM ANALYZE products;

-- Reindex strategy: swap with new index (zero-downtime)
-- 1. Create new index concurrently
CREATE INDEX CONCURRENTLY embedding_hnsw_new ON products
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

-- 2. Swap names (atomic rename)
BEGIN;
  ALTER INDEX embedding_hnsw RENAME TO embedding_hnsw_old;
  ALTER INDEX embedding_hnsw_new RENAME TO embedding_hnsw;
COMMIT;

-- 3. Drop old index
DROP INDEX embedding_hnsw_old;

-- Monitor index memory usage
SELECT
  indexname,
  pg_size_pretty(pg_relation_size(indexrelname)) AS size
FROM pg_stat_user_indexes
WHERE tablename = 'products'
ORDER BY pg_relation_size(indexrelname) DESC;

Checklist

Conclusion

pgvector brings vector search capabilities to PostgreSQL without operational complexity. Start with HNSW indexes for read-heavy workloads, implement hybrid search for relevance, and invest time in tuning index parameters for your specific embedding model and query patterns. Monitor index bloat regularly and plan zero-downtime reindexing strategies. With proper setup, you'll achieve sub-millisecond similarity queries at production scale.