- Published on
Pinecone in Production — Namespaces, Metadata Filtering, and Cost Optimization
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
Pinecone's managed infrastructure eliminates DevOps overhead, but production deployments require strategic choices about namespaces, filtering, and cost management. This guide covers real-world patterns for scaling Pinecone in production.
- Serverless vs Pod-Based Architecture
- Namespace Strategy: Per-Tenant and Per-Environment
- Metadata Filtering Best Practices
- Upsert Batching: 100 Vectors Max
- Hybrid Search: Dense + Sparse
- Index Freshness and Async Upsert
- Query With Score Threshold
- Cost Calculation: Storage + Reads + Writes
- Pinecone Canopy for RAG
- Checklist
- Conclusion
Serverless vs Pod-Based Architecture
Choose based on your query volume and cost model:
import { Pinecone } from "@pinecone-database/pinecone";
const pc = new Pinecone({
apiKey: process.env.PINECONE_API_KEY,
});
// Serverless: pay per query + storage
// Ideal for: variable traffic, startups, <10K QPS
// Cost: $0.40 per 100K queries + $0.10/GB-month storage
const serverlessIndex = pc.Index("my-serverless-index");
// Pod-based: fixed monthly cost + compute units
// Ideal: predictable traffic >10K QPS, enterprises
// Cost: $0.50-$2.00 per pod per month + storage
// Cost calculation for typical use case:
// 100M vectors, 1536 dims (6GB) + 100K queries/day
const monthlyQueries = 100_000 * 30; // 3M queries
const storageGB = 6;
const serverlessCost = (monthlyQueries / 100_000) * 0.40 + storageGB * 0.10;
// = 12 + 0.60 = $12.60/month
const podBasedCost = 2.0 * 1 + storageGB * 0.10; // 1 pod
// = 2.00 + 0.60 = $2.60/month
console.log(`Serverless: $${serverlessCost}/month`);
console.log(`Pod-based: $${podBasedCost}/month`);
// Serverless wins until ~50K QPS; pods win at high volume
Rule of thumb: Serverless for <5K queries/day, pods for >100K queries/day.
Namespace Strategy: Per-Tenant and Per-Environment
Namespaces isolate data within an index, enabling multi-tenancy:
const pc = new Pinecone({
apiKey: process.env.PINECONE_API_KEY,
});
const index = pc.Index("shared-index");
// Namespace per tenant (SaaS multi-tenancy)
const tenantId = "org-12345";
const namespace = `tenant-${tenantId}`;
// Upsert to tenant namespace
await index.namespace(namespace).upsert([
{
id: "doc-1",
values: [0.1, 0.2, 0.3, 0.4],
metadata: {
title: "AI Infrastructure",
author: "alice",
domain: "ai",
},
},
{
id: "doc-2",
values: [0.2, 0.3, 0.4, 0.5],
metadata: {
title: "Vector Databases",
author: "bob",
domain: "db",
},
},
]);
// Query only tenant's data
const results = await index.namespace(namespace).query({
vector: [0.1, 0.2, 0.3, 0.4],
topK: 10,
includeMetadata: true,
});
console.log(`Found ${results.matches.length} results for tenant ${tenantId}`);
// Namespace per environment (dev, staging, prod)
async function queryEnvironment(
env: "dev" | "staging" | "prod",
query: number[],
) {
return index.namespace(env).query({
vector: query,
topK: 10,
});
}
const prodResults = await queryEnvironment("prod", [0.1, 0.2, 0.3, 0.4]);
// Hybrid strategy: tenant + environment
const multiTenantNamespace = `${tenantId}-prod`;
await index.namespace(multiTenantNamespace).upsert([
{
id: "prod-doc-1",
values: [0.1, 0.2, 0.3, 0.4],
metadata: { env: "prod" },
},
]);
Per-tenant namespaces: Enable one index per customer group, reducing index count and costs.
Per-environment namespaces: Run prod, staging, dev on single index with complete isolation.
Metadata Filtering Best Practices
Metadata filtering adds precision to vector search:
const index = pc.Index("documents");
// Insert with rich metadata
await index.upsert([
{
id: "1",
values: [0.1, 0.2, 0.3, 0.4],
metadata: {
title: "Advanced RAG",
category: "ai",
author_id: "user-123",
date_published: "2026-03-15",
confidence: 0.95,
is_public: true,
tags: ["retrieval", "llm", "production"],
},
},
]);
// Query with metadata filter: exact match (string)
const exactMatch = await index.query({
vector: [0.1, 0.2, 0.3, 0.4],
topK: 10,
filter: {
category: { $eq: "ai" },
},
includeMetadata: true,
});
// Query with numeric range
const rangeFilter = await index.query({
vector: [0.1, 0.2, 0.3, 0.4],
topK: 10,
filter: {
confidence: { $gte: 0.9 },
},
});
// Complex filter: AND logic
const complexFilter = await index.query({
vector: [0.1, 0.2, 0.3, 0.4],
topK: 10,
filter: {
$and: [
{ category: { $eq: "ai" } },
{ confidence: { $gte: 0.9 } },
{ is_public: { $eq: true } },
],
},
});
// Filter with $in operator
const multiMatch = await index.query({
vector: [0.1, 0.2, 0.3, 0.4],
topK: 10,
filter: {
author_id: { $in: ["user-123", "user-456", "user-789"] },
},
});
// NOT filter
const exclude = await index.query({
vector: [0.1, 0.2, 0.3, 0.4],
topK: 10,
filter: {
category: { $ne: "spam" },
},
});
Filtering best practices:
- Index only metadata you filter on frequently
- Use $in for many values instead of multiple conditions
- Combine vector similarity with filters for precise results
Upsert Batching: 100 Vectors Max
Pinecone limits batch size to 100 vectors. Implement smart batching:
import { Pinecone, Vector } from "@pinecone-database/pinecone";
const pc = new Pinecone({
apiKey: process.env.PINECONE_API_KEY,
});
const index = pc.Index("documents");
async function batchUpsert(
vectors: Vector[],
batchSize: number = 100,
): Promise<void> {
for (let i = 0; i < vectors.length; i += batchSize) {
const batch = vectors.slice(i, i + batchSize);
try {
await index.upsert(batch);
console.log(`Upserted ${batch.length} vectors (${i + batch.length}/${vectors.length})`);
} catch (error) {
console.error(`Batch failed at index ${i}:`, error);
// Retry or handle error
}
}
}
// Generate documents for bulk ingestion
async function ingestDocuments(documents: { id: string; embedding: number[]; text: string }[]) {
const vectors: Vector[] = documents.map((doc) => ({
id: doc.id,
values: doc.embedding,
metadata: {
text: doc.text,
ingested_at: new Date().toISOString(),
},
}));
await batchUpsert(vectors);
}
// Usage
const documents = Array.from({ length: 10000 }, (_, i) => ({
id: `doc-${i}`,
embedding: Array(1536).fill(0.1 + (i % 256) / 256.0),
text: `Document ${i}`,
}));
await ingestDocuments(documents);
Batch upsert is 10× faster than single upsert. Respect the 100-vector limit.
Hybrid Search: Dense + Sparse
Combine vector embeddings (dense) with keyword search (sparse):
import { Pinecone } from "@pinecone-database/pinecone";
const pc = new Pinecone({
apiKey: process.env.PINECONE_API_KEY,
});
const index = pc.Index("hybrid-index");
// Insert with sparse vectors (keyword matches)
// Sparse vectors have few non-zero dimensions
const documents = [
{
id: "doc-1",
values: [0.1, 0.2, 0.3, 0.4], // dense embedding
sparseValues: {
indices: [0, 42, 128, 512], // keyword indices
values: [1, 1, 0.8, 0.9], // keyword importance
},
metadata: {
keywords: ["vector", "database", "retrieval"],
title: "Vector Databases 101",
},
},
];
await index.upsert(documents);
// Dense query (semantic search)
const denseResults = await index.query({
vector: [0.1, 0.2, 0.3, 0.4],
topK: 10,
includeMetadata: true,
});
// Sparse query (keyword search)
// Requires preprocessing text to sparse indices
function textToSparseVector(text: string) {
const words = text.toLowerCase().split(/\s+/);
// Simplified: in production, use vocabulary hash
const indices = words.map((w) => Math.abs(w.charCodeAt(0) + w.length) % 512);
return { indices, values: Array(indices.length).fill(1.0) };
}
const sparseQuery = textToSparseVector("vector retrieval");
// Hybrid results combine dense + sparse
// Result score = alpha * denseScore + (1 - alpha) * sparseScore
Hybrid search excels for:
- Exact phrase matching + semantic relevance
- Technical documents where keywords matter
- Reducing hallucinations from pure semantic search
Index Freshness and Async Upsert
Pinecone's eventual consistency model requires understanding:
const index = pc.Index("documents");
// Async upsert: faster, might return stale data briefly
await index.upsert([
{
id: "doc-1",
values: [0.1, 0.2, 0.3, 0.4],
metadata: { version: 2 },
},
]);
// Immediately query might return old version (briefly)
const staleResults = await index.query({
vector: [0.1, 0.2, 0.3, 0.4],
topK: 1,
includeMetadata: true,
});
// For consistency-critical operations, add delay
await new Promise((resolve) => setTimeout(resolve, 500));
const freshResults = await index.query({
vector: [0.1, 0.2, 0.3, 0.4],
topK: 1,
includeMetadata: true,
});
// For real-time requirements, use shorter freshness windows
// Typical indexing latency: <100ms
Eventual consistency: Accept <1 second staleness for faster ingest.
Query With Score Threshold
Confidence filtering prevents low-quality results:
const index = pc.Index("documents");
const results = await index.query({
vector: [0.1, 0.2, 0.3, 0.4],
topK: 100, // Fetch many, filter by confidence
includeMetadata: true,
});
const threshold = 0.7;
const filteredResults = results.matches.filter(
(match) => match.score >= threshold,
);
console.log(
`Retrieved ${results.matches.length} results, ${filteredResults.length} above threshold`,
);
// Return only high-confidence results
return filteredResults.slice(0, 10);
Always set a confidence threshold to prevent bad results. 0.7+ is typical.
Cost Calculation: Storage + Reads + Writes
Understand your bill:
interface PineconeUsage {
storageGB: number;
queriesPerMonth: number;
vectorsUpsertedPerMonth: number;
podCount?: number; // For pod-based
}
function calculateServerlessCost(usage: PineconeUsage): number {
// Query cost: $0.40 per 100K queries
const queryCost = (usage.queriesPerMonth / 100_000) * 0.40;
// Storage cost: $0.10 per GB-month
const storageCost = usage.storageGB * 0.10;
// Upsert cost: $0.10 per 1M vectors
const upsertCost = (usage.vectorsUpsertedPerMonth / 1_000_000) * 0.10;
return queryCost + storageCost + upsertCost;
}
function calculatePodCost(usage: PineconeUsage): number {
// Pod cost: $2.00 per pod-month
const podCost = (usage.podCount || 1) * 2.0;
// Storage cost: $0.10 per GB-month
const storageCost = usage.storageGB * 0.10;
// No per-query cost with pods
return podCost + storageCost;
}
// Example: 500M vectors (3GB), 1M queries/month, 100M upserts/month
const usage: PineconeUsage = {
storageGB: 3,
queriesPerMonth: 1_000_000,
vectorsUpsertedPerMonth: 100_000_000,
podCount: 2,
};
const serverlessCost = calculateServerlessCost(usage);
const podCost = calculatePodCost(usage);
console.log(`Serverless: $${serverlessCost.toFixed(2)}/month`);
console.log(`Pod-based (2 pods): $${podCost.toFixed(2)}/month`);
// Typical breakdown:
// - Storage dominates for read-heavy workloads
// - Queries matter for high-throughput systems
// - Upserts negligible unless >1B vectors/month
Optimize by understanding your bottleneck: queries, storage, or upserts.
Pinecone Canopy for RAG
Canopy simplifies RAG pipeline integration:
import { Pinecone } from "@pinecone-database/pinecone";
import { OpenAI } from "openai";
const pc = new Pinecone({
apiKey: process.env.PINECONE_API_KEY,
});
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
// Canopy handles embedding + retrieval + generation
async function ragQuery(userQuestion: string): Promise<string> {
const index = pc.Index("rag-docs");
// Retrieve relevant documents
const queryEmbedding = await openai.embeddings.create({
model: "text-embedding-3-small",
input: userQuestion,
});
const results = await index.query({
vector: queryEmbedding.data[0].embedding,
topK: 3,
includeMetadata: true,
});
// Build context
const context = results.matches
.map((match) => match.metadata?.text || "")
.join("\n");
// Generate response with context
const completion = await openai.chat.completions.create({
model: "gpt-4-turbo",
messages: [
{
role: "system",
content: `You are a helpful assistant. Answer based on the provided context.
Context:
${context}`,
},
{
role: "user",
content: userQuestion,
},
],
});
return completion.choices[0].message.content || "";
}
// Usage
const answer = await ragQuery("What is semantic search?");
console.log(answer);
Canopy abstracts retrieval, enabling focus on generation quality.
Checklist
- Decide serverless vs pod-based based on query volume
- Design namespace strategy (per-tenant vs per-environment)
- Index metadata fields for filtering
- Implement batch upsert (<= 100 vectors per batch)
- Set up monitoring for query latency and costs
- Test hybrid search on production queries
- Calculate 12-month cost and set alerts
- Implement score threshold filtering
- Document namespace and metadata schema
- Plan backup and recovery procedures
Conclusion
Pinecone's managed architecture removes operational burden. Focus on smart namespace design for multi-tenancy, strategic metadata filtering, and cost optimization. Understand serverless vs pods trade-offs. Master batching, filtering, and hybrid search. Deploy with confidence at any scale.