Vector Databases — Pinecone vs Chroma vs Weaviate
Advertisement
Introduction
Vector databases are essential for RAG systems and semantic search. This guide compares three popular options: Pinecone (managed), Chroma (open-source, local), and Weaviate (open-source, scalable).
- Pinecone: Managed Vector Database
- Pinecone Quick Start
- Chroma: Local Vector Database
- Chroma Quick Start
- Persistent Chroma
- Weaviate: Open-Source Scalable Vector DB
- Weaviate Quick Start
- Feature Comparison Table
- When to Use Each
- Integration with LangChain
- Conclusion
- FAQ
Pinecone: Managed Vector Database
Pinecone is a fully managed vector database designed for production scale.
Advantages:
- Zero infrastructure management
- Automatic scaling
- Built-in hybrid search (vector + keyword)
- Low latency globally distributed
- Free tier available
Disadvantages:
- Recurring costs even with low usage
- Vendor lock-in
- Data stored on Pinecone servers
Pinecone Quick Start
import pinecone
# Initialize Pinecone
pc = pinecone.Pinecone(api_key="your-api-key")
# Create index
pc.create_index(
name="documents",
dimension=1536,
metric="cosine",
spec=pinecone.ServerlessSpec(cloud="aws", region="us-east-1")
)
index = pc.Index("documents")
# Upsert vectors
vectors = [
("id1", [0.1, 0.2, 0.3, ...], {"text": "sample"}),
("id2", [0.2, 0.3, 0.4, ...], {"text": "sample2"}),
]
index.upsert(vectors=vectors)
# Query
results = index.query(
vector=[0.15, 0.25, 0.35, ...],
top_k=3,
include_metadata=True
)
for match in results['matches']:
print(f"ID: {match['id']}, Score: {match['score']}")
print(f"Text: {match['metadata']['text']}")
Chroma: Local Vector Database
Chroma is an open-source, embedded vector database perfect for development and small deployments.
Advantages:
- Runs locally (no server needed)
- Free and open-source
- Simple API
- Good for prototyping
- In-memory or persistent storage
Disadvantages:
- Not designed for massive scale
- Limited query optimization
- Fewer advanced features
- Community support only
Chroma Quick Start
import chromadb
# Create client (in-memory)
client = chromadb.Client()
# Create collection
collection = client.create_collection(
name="documents",
metadata={"hnsw:space": "cosine"}
)
# Add documents with embeddings
collection.add(
ids=["id1", "id2", "id3"],
embeddings=[
[0.1, 0.2, 0.3, ...],
[0.2, 0.3, 0.4, ...],
[0.3, 0.4, 0.5, ...],
],
documents=[
"First document",
"Second document",
"Third document"
]
)
# Query
results = collection.query(
query_embeddings=[[0.15, 0.25, 0.35, ...]],
n_results=3
)
for doc in results['documents'][0]:
print(doc)
Persistent Chroma
# Persistent client (saves to disk)
client = chromadb.PersistentClient(path="/path/to/data")
collection = client.get_or_create_collection("documents")
Weaviate: Open-Source Scalable Vector DB
Weaviate is a distributed vector database with advanced features for production systems.
Advantages:
- Self-hosted (full control)
- Scalable architecture
- GraphQL and REST APIs
- Advanced retrieval (hybrid search, BM25)
- Module system for ML models
- Mature and production-ready
Disadvantages:
- Requires infrastructure management
- Steeper learning curve
- More complex setup
Weaviate Quick Start
import weaviate
from weaviate.connect import ConnectionParams
# Connect to Weaviate instance
client = weaviate.connect_to_local()
# Create schema
client.collections.create(
name="Document",
properties=[
weaviate.classes.Property(
name="content",
data_type=weaviate.classes.DataType.TEXT,
),
],
vectorizer_config=weaviate.classes.config.Configure.Vectorizer.text2vec_openai(),
)
# Add objects
collection = client.collections.get("Document")
with collection.batch.dynamic() as batch:
batch.add_object(
properties={"content": "Sample document text"},
vector=[0.1, 0.2, 0.3, ...]
)
# Query (vector search)
results = collection.query.near_vector(
near_vector=[0.15, 0.25, 0.35, ...],
limit=3,
return_metadata=weaviate.classes.query.MetadataQuery(distance=True)
)
for item in results.objects:
print(f"Content: {item.properties['content']}")
print(f"Distance: {item.metadata.distance}")
Feature Comparison Table
| Feature | Pinecone | Chroma | Weaviate |
|---|---|---|---|
| Deployment | Managed | Local/Self-hosted | Self-hosted |
| Scaling | Auto | Limited | Distributed |
| Hybrid Search | Yes | No | Yes |
| GraphQL API | No | No | Yes |
| Cost | Subscription | Free | Free (hosting costs) |
| Setup Time | Minutes | Seconds | Hours |
| Best For | Production SaaS | Prototyping | Enterprise |
When to Use Each
Use Pinecone if:
- You need production-grade infrastructure
- You don't want to manage servers
- You have budget for managed services
- You need low latency globally
Use Chroma if:
- You're prototyping or building locally
- You need simple, fast setup
- You have small datasets
- You want open-source simplicity
Use Weaviate if:
- You need full control
- You have advanced search requirements
- You have engineering resources
- You need GraphQL capabilities
Integration with LangChain
# Pinecone
from langchain_community.vectorstores import Pinecone
vectorstore = Pinecone.from_documents(
documents, embeddings, index_name="documents"
)
# Chroma
from langchain_community.vectorstores import Chroma
vectorstore = Chroma.from_documents(documents, embeddings)
# Weaviate
from langchain_community.vectorstores import Weaviate
vectorstore = Weaviate.from_documents(
documents, embeddings, client=weaviate_client
)
Conclusion
Choose based on your needs: Pinecone for managed simplicity, Chroma for quick prototyping, Weaviate for advanced self-hosted deployments. Many teams use Chroma in development and migrate to Pinecone or Weaviate for production.
FAQ
Q: Can I start with Chroma and migrate to Pinecone? A: Yes, easily. Both support standard vector operations; mainly API differences. Export vectors from Chroma and import to Pinecone.
Q: Which is cheapest? A: Chroma (free, open-source). Weaviate is free software but hosting costs money. Pinecone has recurring costs.
Q: What's the maximum vector dimension? A: Pinecone and Weaviate support up to 2048+ dimensions. Chroma depends on HNSW implementation (typically 1024+).
Advertisement