- Published on
Vector Database Comparison 2026 — Pinecone, Weaviate, Qdrant, and pgvector
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
Vector databases are the backbone of modern AI applications. With embeddings powering RAG, semantic search, and recommendation systems, choosing the right vector DB has become critical. In 2026, the landscape has matured significantly with clear trade-offs between managed services and self-hosted solutions.
- Comparison Matrix
- Pinecone Serverless vs Pod Architecture
- Weaviate Multi-Tenancy
- Qdrant Quantization for Cost
- pgvector for Postgres Stacks
- Chroma for Prototyping
- Milvus for On-Prem Scale
- Decision Tree for Choosing
- Migration Between Vector DBs
- Checklist
- Conclusion
Comparison Matrix
Here's how the major vector databases stack up:
| Database | Deployment | Cost Model | Max Scale | Filtering | Multi-Tenancy |
|---|---|---|---|---|---|
| Pinecone | Managed | Per-query + storage | 100B+ vectors | Yes | Native |
| Weaviate | Both | Open-source + SaaS | 50B+ vectors | Yes | Multi-tenant ready |
| Qdrant | Both | Open-source + Cloud | 50B+ vectors | Yes | Per-collection |
| pgvector | Self-hosted | PostgreSQL licensing | 10B+ | SQL filters | Via row-level |
| Chroma | Self-hosted | Open-source | 1B+ vectors | Basic | No |
| Milvus | Self-hosted | Open-source | 100B+ vectors | Yes | Per-collection |
Pinecone Serverless vs Pod Architecture
Pinecone's serverless offering eliminates infrastructure management. Pay only for what you query:
import { Pinecone } from '@pinecone-database/pinecone';
const pc = new Pinecone({
apiKey: process.env.PINECONE_API_KEY,
});
const index = pc.Index('my-index');
// Upsert vectors with metadata
const vectors = [
{
id: 'vec-1',
values: [0.1, 0.2, 0.3, 0.4],
metadata: { source: 'docs', category: 'ai' },
},
{
id: 'vec-2',
values: [0.2, 0.3, 0.4, 0.5],
metadata: { source: 'blog', category: 'ml' },
},
];
await index.upsert(vectors);
// Query with metadata filtering
const results = await index.query({
vector: [0.1, 0.2, 0.3, 0.4],
topK: 10,
filter: {
category: { $eq: 'ai' },
},
includeMetadata: true,
});
console.log(results.matches);
Pod-based Pinecone offers predictable costs for sustained workloads >10K queries/day. Serverless suits variable traffic patterns with cost starting at $0.40 per 100K queries.
Weaviate Multi-Tenancy
Weaviate excels for multi-tenant SaaS with native tenant isolation:
import weaviate
from weaviate.classes.config import Property, DataType
client = weaviate.connect_to_local()
# Create class with multi-tenancy
client.collections.create(
name="Documents",
properties=[
Property(name="content", data_type=DataType.TEXT),
Property(name="title", data_type=DataType.TEXT),
Property(name="embedding", data_type=DataType.VECTOR),
],
multi_tenancy_config=weaviate.classes.config.MultiTenancyConfig(
enabled=True
),
vectorizer_config=weaviate.classes.config.Configure.Vectorizer.text2vec_openai(),
)
# Insert objects per tenant
collection = client.collections.get("Documents")
collection.data.insert(
properties={
"content": "RAG best practices",
"title": "Advanced RAG",
},
tenant="tenant-123",
)
# Query with tenant isolation
results = collection.query.near_text(
query="vector retrieval",
limit=5,
tenant="tenant-123",
)
for obj in results.objects:
print(obj.properties)
Weaviate's GraphQL API and hybrid search (dense + sparse BM25) make it powerful for complex queries. Multi-tenancy reduces per-tenant infrastructure costs.
Qdrant Quantization for Cost
Qdrant's scalar and binary quantization reduce memory usage by 4-8×:
from qdrant_client import QdrantClient
from qdrant_client.models import (
PointStruct, VectorParams, QuantizationConfig, ScalarQuantization,
ScalarType, RecreatePayloadIndex
)
client = QdrantClient(":memory:")
# Create collection with scalar quantization (int8)
client.recreate_collection(
collection_name="docs",
vectors_config=VectorParams(
size=1536,
distance="Cosine",
),
quantization_config=ScalarQuantization(
scalar=ScalarType.INT8,
always_ram=False,
),
)
# Insert vectors with payload filtering support
points = [
PointStruct(
id=1,
vector=[0.1] * 1536,
payload={"text": "embedding post", "domain": "ai"},
),
PointStruct(
id=2,
vector=[0.2] * 1536,
payload={"text": "vector search", "domain": "db"},
),
]
client.upsert(collection_name="docs", points=points)
# Search with payload filtering
results = client.search(
collection_name="docs",
query_vector=[0.1] * 1536,
limit=5,
query_filter={
"must": [
{"key": "domain", "match": {"value": "ai"}},
],
},
)
for result in results:
print(f"Score: {result.score}, Payload: {result.payload}")
Binary quantization cuts memory to 1/32nd original size, ideal for ultra-large deployments. Trade-off: ~2-3% recall loss.
pgvector for Postgres Stacks
If you already run PostgreSQL, pgvector avoids operational overhead:
-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create table with vector column
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT NOT NULL,
embedding vector(1536),
metadata JSONB,
created_at TIMESTAMP DEFAULT NOW()
);
-- Create HNSW index for fast search
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
-- Insert vector
INSERT INTO documents (content, embedding, metadata)
VALUES (
'Vector database fundamentals',
'[0.1, 0.2, 0.3, ...]'::vector,
'{"source": "blog", "tags": ["ai", "infra"]}'
);
-- Search with cosine similarity
SELECT
id, content, 1 - (embedding <=> '[0.1, 0.2, 0.3, ...]'::vector) as similarity
FROM documents
WHERE metadata->>'source' = 'blog'
ORDER BY embedding <=> '[0.1, 0.2, 0.3, ...]'::vector
LIMIT 10;
-- Index on JSONB metadata for complex filtering
CREATE INDEX ON documents USING GIN (metadata);
pgvector integrates seamlessly with Python/Node ORMs. No separate infrastructure. Perfect for <1B vectors.
Chroma for Prototyping
Chroma prioritizes developer experience. Embed data and query in minutes:
import chromadb
client = chromadb.Client()
collection = client.create_collection(name="papers")
# Add documents with automatic embeddings
collection.add(
ids=["1", "2", "3"],
documents=[
"Vector databases store embeddings for fast retrieval",
"Semantic search uses embeddings to find similar content",
"RAG systems combine retrieval with generation",
],
metadatas=[
{"source": "blog"},
{"source": "paper"},
{"source": "doc"},
],
)
# Query
results = collection.query(
query_texts=["embedding databases"],
n_results=2,
where={"source": "blog"},
)
print(results["documents"])
Chroma's in-memory mode ships with SQLite persistence. Great for prototypes; outgrow quickly at scale.
Milvus for On-Prem Scale
Milvus handles 100B+ vectors with sophisticated indexing:
from pymilvus import MilvusClient, model
# Initialize with local deployment
client = MilvusClient(uri="http://localhost:19530")
# Create collection
client.create_collection(
collection_name="embeddings",
dimension=1536,
metric_type="COSINE",
index_type="HNSW",
index_params={"M": 8, "efConstruction": 200},
)
# Generate embeddings
embedding_fn = model.dense.OpenAIEmbeddingFunction(
model_name="text-embedding-3-small"
)
texts = [
"Milvus scales to billions of vectors",
"Multi-field filtering on metadata",
]
embeddings = embedding_fn(texts)
# Insert with metadata
client.insert(
collection_name="embeddings",
data=[
{"id": i, "vector": emb, "text": txt, "domain": "ai"}
for i, (emb, txt) in enumerate(zip(embeddings, texts))
],
)
# Search
search_results = client.search(
collection_name="embeddings",
data=embedding_fn(["vector database"]),
filter='domain == "ai"',
limit=5,
output_fields=["text", "domain"],
)
print(search_results)
Milvus offers GPU acceleration and sparse-dense hybrid search. Control over all tuning parameters.
Decision Tree for Choosing
- Managed fully (no DevOps) → Pinecone serverless or Weaviate Cloud
- Multi-tenant SaaS → Weaviate
- Cost per query matters → Pinecone serverless
- Predictable sustained traffic → Pinecone pods
- >10B vectors on budget → Qdrant (self-hosted) + quantization
- Existing Postgres stack → pgvector
- Fast prototyping → Chroma
- 100B+ vectors, on-prem → Milvus
Migration Between Vector DBs
# Export from source (Pinecone)
import json
from pinecone import Pinecone
pc = Pinecone(api_key="...")
index = pc.Index("old-index")
all_vectors = []
for ids_batch in index.list(batch_size=100):
vectors = index.fetch(ids=ids_batch)
all_vectors.extend(vectors["vectors"].values())
# Save to JSONL
with open("vectors.jsonl", "w") as f:
for vec in all_vectors:
f.write(json.dumps(vec) + "\n")
# Import to destination (Qdrant)
from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct
client = QdrantClient(":memory:")
points = [
PointStruct(
id=int(vec["id"]),
vector=vec["values"],
payload=vec.get("metadata", {}),
)
for vec in all_vectors
]
client.upsert(collection_name="new-index", points=points)
Test queries on both systems before cutover. Verify embedding dimensions match.
Checklist
- Define throughput and scale requirements
- Identify filtering and multi-tenancy needs
- Calculate 12-month cost across options
- Benchmark query latency with production payloads
- Plan backup and disaster recovery
- Set up monitoring and alerts
- Document metadata schema and indexing strategy
- Create migration runbook
Conclusion
In 2026, vector database choice hinges on operational maturity and cost sensitivity. Pinecone leads managed simplicity, Weaviate excels at multi-tenancy, Qdrant delivers cost efficiency, and pgvector suits existing PostgreSQL investments. Evaluate your scale (vectors and QPS), team DevOps capacity, and 12-month total cost before committing.