Vector Databases — Pinecone vs Chroma vs Weaviate

Sanjeev SharmaSanjeev Sharma
4 min read

Advertisement

Introduction

Vector databases are essential for RAG systems and semantic search. This guide compares three popular options: Pinecone (managed), Chroma (open-source, local), and Weaviate (open-source, scalable).

Pinecone: Managed Vector Database

Pinecone is a fully managed vector database designed for production scale.

Advantages:

  • Zero infrastructure management
  • Automatic scaling
  • Built-in hybrid search (vector + keyword)
  • Low latency globally distributed
  • Free tier available

Disadvantages:

  • Recurring costs even with low usage
  • Vendor lock-in
  • Data stored on Pinecone servers

Pinecone Quick Start

import pinecone

# Initialize Pinecone
pc = pinecone.Pinecone(api_key="your-api-key")

# Create index
pc.create_index(
    name="documents",
    dimension=1536,
    metric="cosine",
    spec=pinecone.ServerlessSpec(cloud="aws", region="us-east-1")
)

index = pc.Index("documents")

# Upsert vectors
vectors = [
    ("id1", [0.1, 0.2, 0.3, ...], {"text": "sample"}),
    ("id2", [0.2, 0.3, 0.4, ...], {"text": "sample2"}),
]
index.upsert(vectors=vectors)

# Query
results = index.query(
    vector=[0.15, 0.25, 0.35, ...],
    top_k=3,
    include_metadata=True
)

for match in results['matches']:
    print(f"ID: {match['id']}, Score: {match['score']}")
    print(f"Text: {match['metadata']['text']}")

Chroma: Local Vector Database

Chroma is an open-source, embedded vector database perfect for development and small deployments.

Advantages:

  • Runs locally (no server needed)
  • Free and open-source
  • Simple API
  • Good for prototyping
  • In-memory or persistent storage

Disadvantages:

  • Not designed for massive scale
  • Limited query optimization
  • Fewer advanced features
  • Community support only

Chroma Quick Start

import chromadb

# Create client (in-memory)
client = chromadb.Client()

# Create collection
collection = client.create_collection(
    name="documents",
    metadata={"hnsw:space": "cosine"}
)

# Add documents with embeddings
collection.add(
    ids=["id1", "id2", "id3"],
    embeddings=[
        [0.1, 0.2, 0.3, ...],
        [0.2, 0.3, 0.4, ...],
        [0.3, 0.4, 0.5, ...],
    ],
    documents=[
        "First document",
        "Second document",
        "Third document"
    ]
)

# Query
results = collection.query(
    query_embeddings=[[0.15, 0.25, 0.35, ...]],
    n_results=3
)

for doc in results['documents'][0]:
    print(doc)

Persistent Chroma

# Persistent client (saves to disk)
client = chromadb.PersistentClient(path="/path/to/data")
collection = client.get_or_create_collection("documents")

Weaviate: Open-Source Scalable Vector DB

Weaviate is a distributed vector database with advanced features for production systems.

Advantages:

  • Self-hosted (full control)
  • Scalable architecture
  • GraphQL and REST APIs
  • Advanced retrieval (hybrid search, BM25)
  • Module system for ML models
  • Mature and production-ready

Disadvantages:

  • Requires infrastructure management
  • Steeper learning curve
  • More complex setup

Weaviate Quick Start

import weaviate
from weaviate.connect import ConnectionParams

# Connect to Weaviate instance
client = weaviate.connect_to_local()

# Create schema
client.collections.create(
    name="Document",
    properties=[
        weaviate.classes.Property(
            name="content",
            data_type=weaviate.classes.DataType.TEXT,
        ),
    ],
    vectorizer_config=weaviate.classes.config.Configure.Vectorizer.text2vec_openai(),
)

# Add objects
collection = client.collections.get("Document")
with collection.batch.dynamic() as batch:
    batch.add_object(
        properties={"content": "Sample document text"},
        vector=[0.1, 0.2, 0.3, ...]
    )

# Query (vector search)
results = collection.query.near_vector(
    near_vector=[0.15, 0.25, 0.35, ...],
    limit=3,
    return_metadata=weaviate.classes.query.MetadataQuery(distance=True)
)

for item in results.objects:
    print(f"Content: {item.properties['content']}")
    print(f"Distance: {item.metadata.distance}")

Feature Comparison Table

FeaturePineconeChromaWeaviate
DeploymentManagedLocal/Self-hostedSelf-hosted
ScalingAutoLimitedDistributed
Hybrid SearchYesNoYes
GraphQL APINoNoYes
CostSubscriptionFreeFree (hosting costs)
Setup TimeMinutesSecondsHours
Best ForProduction SaaSPrototypingEnterprise

When to Use Each

Use Pinecone if:

  • You need production-grade infrastructure
  • You don't want to manage servers
  • You have budget for managed services
  • You need low latency globally

Use Chroma if:

  • You're prototyping or building locally
  • You need simple, fast setup
  • You have small datasets
  • You want open-source simplicity

Use Weaviate if:

  • You need full control
  • You have advanced search requirements
  • You have engineering resources
  • You need GraphQL capabilities

Integration with LangChain

# Pinecone
from langchain_community.vectorstores import Pinecone
vectorstore = Pinecone.from_documents(
    documents, embeddings, index_name="documents"
)

# Chroma
from langchain_community.vectorstores import Chroma
vectorstore = Chroma.from_documents(documents, embeddings)

# Weaviate
from langchain_community.vectorstores import Weaviate
vectorstore = Weaviate.from_documents(
    documents, embeddings, client=weaviate_client
)

Conclusion

Choose based on your needs: Pinecone for managed simplicity, Chroma for quick prototyping, Weaviate for advanced self-hosted deployments. Many teams use Chroma in development and migrate to Pinecone or Weaviate for production.

FAQ

Q: Can I start with Chroma and migrate to Pinecone? A: Yes, easily. Both support standard vector operations; mainly API differences. Export vectors from Chroma and import to Pinecone.

Q: Which is cheapest? A: Chroma (free, open-source). Weaviate is free software but hosting costs money. Pinecone has recurring costs.

Q: What's the maximum vector dimension? A: Pinecone and Weaviate support up to 2048+ dimensions. Chroma depends on HNSW implementation (typically 1024+).

Advertisement

Sanjeev Sharma

Written by

Sanjeev Sharma

Full Stack Engineer · E-mopro