Vector Databases — Pinecone vs Chroma vs Weaviate

Introduction

Vector databases are essential for RAG systems and semantic search. This guide compares three popular options: Pinecone (managed), Chroma (open-source, local), and Weaviate (open-source, scalable).

Pinecone: Managed Vector Database
Pinecone Quick Start
Chroma: Local Vector Database
Chroma Quick Start
Persistent Chroma
Weaviate: Open-Source Scalable Vector DB
Weaviate Quick Start
Feature Comparison Table
When to Use Each
Integration with LangChain
Conclusion
FAQ

Pinecone: Managed Vector Database

Pinecone is a fully managed vector database designed for production scale.

Advantages:

Zero infrastructure management
Automatic scaling
Built-in hybrid search (vector + keyword)
Low latency globally distributed
Free tier available

Disadvantages:

Recurring costs even with low usage
Vendor lock-in
Data stored on Pinecone servers

Pinecone Quick Start

import pinecone

# Initialize Pinecone
pc = pinecone.Pinecone(api_key="your-api-key")

# Create index
pc.create_index(
    name="documents",
    dimension=1536,
    metric="cosine",
    spec=pinecone.ServerlessSpec(cloud="aws", region="us-east-1")
)

index = pc.Index("documents")

# Upsert vectors
vectors = [
    ("id1", [0.1, 0.2, 0.3, ...], {"text": "sample"}),
    ("id2", [0.2, 0.3, 0.4, ...], {"text": "sample2"}),
]
index.upsert(vectors=vectors)

# Query
results = index.query(
    vector=[0.15, 0.25, 0.35, ...],
    top_k=3,
    include_metadata=True
)

for match in results['matches']:
    print(f"ID: {match['id']}, Score: {match['score']}")
    print(f"Text: {match['metadata']['text']}")

Chroma: Local Vector Database

Chroma is an open-source, embedded vector database perfect for development and small deployments.

Advantages:

Runs locally (no server needed)
Free and open-source
Simple API
Good for prototyping
In-memory or persistent storage

Disadvantages:

Not designed for massive scale
Limited query optimization
Fewer advanced features
Community support only

Chroma Quick Start

import chromadb

# Create client (in-memory)
client = chromadb.Client()

# Create collection
collection = client.create_collection(
    name="documents",
    metadata={"hnsw:space": "cosine"}
)

# Add documents with embeddings
collection.add(
    ids=["id1", "id2", "id3"],
    embeddings=[
        [0.1, 0.2, 0.3, ...],
        [0.2, 0.3, 0.4, ...],
        [0.3, 0.4, 0.5, ...],
    ],
    documents=[
        "First document",
        "Second document",
        "Third document"
    ]
)

# Query
results = collection.query(
    query_embeddings=[[0.15, 0.25, 0.35, ...]],
    n_results=3
)

for doc in results['documents'][0]:
    print(doc)

Persistent Chroma

# Persistent client (saves to disk)
client = chromadb.PersistentClient(path="/path/to/data")
collection = client.get_or_create_collection("documents")

Weaviate: Open-Source Scalable Vector DB

Weaviate is a distributed vector database with advanced features for production systems.

Advantages:

Self-hosted (full control)
Scalable architecture
GraphQL and REST APIs
Advanced retrieval (hybrid search, BM25)
Module system for ML models
Mature and production-ready

Disadvantages:

Requires infrastructure management
Steeper learning curve
More complex setup

Weaviate Quick Start

import weaviate
from weaviate.connect import ConnectionParams

# Connect to Weaviate instance
client = weaviate.connect_to_local()

# Create schema
client.collections.create(
    name="Document",
    properties=[
        weaviate.classes.Property(
            name="content",
            data_type=weaviate.classes.DataType.TEXT,
        ),
    ],
    vectorizer_config=weaviate.classes.config.Configure.Vectorizer.text2vec_openai(),
)

# Add objects
collection = client.collections.get("Document")
with collection.batch.dynamic() as batch:
    batch.add_object(
        properties={"content": "Sample document text"},
        vector=[0.1, 0.2, 0.3, ...]
    )

# Query (vector search)
results = collection.query.near_vector(
    near_vector=[0.15, 0.25, 0.35, ...],
    limit=3,
    return_metadata=weaviate.classes.query.MetadataQuery(distance=True)
)

for item in results.objects:
    print(f"Content: {item.properties['content']}")
    print(f"Distance: {item.metadata.distance}")

Feature Comparison Table

Feature	Pinecone	Chroma	Weaviate
Deployment	Managed	Local/Self-hosted	Self-hosted
Scaling	Auto	Limited	Distributed
Hybrid Search	Yes	No	Yes
GraphQL API	No	No	Yes
Cost	Subscription	Free	Free (hosting costs)
Setup Time	Minutes	Seconds	Hours
Best For	Production SaaS	Prototyping	Enterprise

When to Use Each

Use Pinecone if:

You need production-grade infrastructure
You don't want to manage servers
You have budget for managed services
You need low latency globally

Use Chroma if:

You're prototyping or building locally
You need simple, fast setup
You have small datasets
You want open-source simplicity

Use Weaviate if:

You need full control
You have advanced search requirements
You have engineering resources
You need GraphQL capabilities

Integration with LangChain

# Pinecone
from langchain_community.vectorstores import Pinecone
vectorstore = Pinecone.from_documents(
    documents, embeddings, index_name="documents"
)

# Chroma
from langchain_community.vectorstores import Chroma
vectorstore = Chroma.from_documents(documents, embeddings)

# Weaviate
from langchain_community.vectorstores import Weaviate
vectorstore = Weaviate.from_documents(
    documents, embeddings, client=weaviate_client
)

Conclusion

Choose based on your needs: Pinecone for managed simplicity, Chroma for quick prototyping, Weaviate for advanced self-hosted deployments. Many teams use Chroma in development and migrate to Pinecone or Weaviate for production.

FAQ

Q: Can I start with Chroma and migrate to Pinecone? A: Yes, easily. Both support standard vector operations; mainly API differences. Export vectors from Chroma and import to Pinecone.

Q: Which is cheapest? A: Chroma (free, open-source). Weaviate is free software but hosting costs money. Pinecone has recurring costs.

Q: What's the maximum vector dimension? A: Pinecone and Weaviate support up to 2048+ dimensions. Chroma depends on HNSW implementation (typically 1024+).