- Published on
Multi-Tenant AI Architecture — Isolating Data, Costs, and Models Per Customer
- Authors
- Name
Introduction
Multi-tenant SaaS systems must isolate customers completely. One tenant's data cannot leak to another. One tenant's costs cannot subsidize another. One tenant's model cannot see another's context. This post covers patterns for building isolated AI systems at scale.
- Data Isolation for AI
- Per-Tenant Model Configuration
- Per-Tenant Cost Tracking and Billing
- Per-Tenant Rate Limits
- Tenant-Specific Fine-Tuned Models
- Preventing Cross-Tenant Data Leakage in RAG
- Per-Tenant Conversation History Isolation
- Tenant-Aware Caching
- Compliance Requirements Per Tenant
- Checklist
- Conclusion
Data Isolation for AI
Traditional multi-tenant isolation (row-level security in databases) isn't enough for AI. Vector databases, embedding models, and prompt context all need isolation.
// Vector DB: always filter by tenant
async function searchVectors(userId: string, tenantId: string, query: string) {
const embedding = await model.embed(query);
const results = await vectorDb.search({
vector: embedding,
topK: 5,
filter: {
tenantId: tenantId // CRITICAL: filter by tenant
}
});
return results;
}
// RAG context: tag all documents with tenant
async function indexDocument(tenantId: string, document: string) {
const chunks = document.split('\n\n');
for (const chunk of chunks) {
const embedding = await model.embed(chunk);
await vectorDb.insert({
embedding,
content: chunk,
tenantId, // Tag every chunk
documentId: uuidv4(),
metadata: { createdAt: new Date() }
});
}
}
// LLM context: include tenant
async function generateResponse(userId: string, tenantId: string, prompt: string) {
// Get tenant-scoped RAG context
const context = await searchVectors(userId, tenantId, prompt);
const messages = [
{
role: 'system',
content: `You are an AI assistant for ${tenantId}.
Use ONLY the following documents that belong to this tenant:
${context.map(r => r.content).join('\n---\n')}`
},
{
role: 'user',
content: prompt
}
];
const response = await openai.createChatCompletion({
model: 'gpt-4o',
messages
});
return response.choices[0].message.content;
}
Tenant isolation in vector searches is critical. Without filters, a user could search and receive context from other tenants. Embedding models don't understand tenancy—you must enforce it at the retrieval layer.
Per-Tenant Model Configuration
Different tiers might use different models. Enterprise customers get GPT-4o; free tier gets GPT-4o-mini.
async function getModelForTenant(tenantId: string) {
const tenant = await db.tenants.findOne({ id: tenantId });
const modelConfig = {
basic: {
model: 'gpt-4o-mini',
maxTokens: 500,
temperature: 0.7
},
pro: {
model: 'gpt-4o',
maxTokens: 2000,
temperature: 0.7
},
enterprise: {
model: 'gpt-4o', // Or custom fine-tuned model
maxTokens: 4000,
temperature: 0.3, // Lower temp for consistency
customSystemPrompt: tenant.customSystemPrompt // Tenant-specific instructions
}
};
return modelConfig[tenant.tier];
}
// Use per-tenant model
async function generateResponse(tenantId: string, prompt: string) {
const config = await getModelForTenant(tenantId);
const response = await openai.createChatCompletion({
model: config.model,
max_tokens: config.maxTokens,
temperature: config.temperature,
system: config.customSystemPrompt || 'You are a helpful assistant.'
});
return response.choices[0].message.content;
}
Per-tenant models enable tiering: free customers get cheaper models, paying customers get better ones.
Per-Tenant Cost Tracking and Billing
Every tenant must know their usage. Don't pool costs.
interface TenantCostRecord {
tenantId: string;
date: Date;
llmTokens: number;
llmCostUsd: number;
dbQueryCount: number;
dbCostUsd: number;
vectorSearchCount: number;
vectorSearchCostUsd: number;
totalCostUsd: number;
}
async function recordCost(
tenantId: string,
tokens: number,
costUsd: number,
operation: string
) {
const today = new Date().toISOString().split('T')[0];
await db.tenantCosts.updateOne(
{ tenantId, date: today },
{
$inc: {
llmTokens: tokens,
llmCostUsd: costUsd,
totalCostUsd: costUsd
},
$push: {
operations: { type: operation, cost: costUsd, timestamp: new Date() }
}
},
{ upsert: true }
);
// Check if tenant has exceeded budget
const monthlyCost = await db.tenantCosts.sumByMonth(tenantId, currentMonth());
if (monthlyCost > tenantLimits[tenant.tier].monthlyLimit) {
await disableAIFeatures(tenantId);
await sendAlert(tenantId, `Monthly cost limit exceeded: $${monthlyCost}`);
}
}
// Tenant billing dashboard
async function getTenantCosts(tenantId: string, period: string) {
const costs = await db.tenantCosts.find({
tenantId,
date: { $gte: startOfPeriod(period) }
});
return {
total: costs.reduce((sum, c) => sum + c.totalCostUsd, 0),
breakdown: {
llm: costs.reduce((sum, c) => sum + c.llmCostUsd, 0),
database: costs.reduce((sum, c) => sum + c.dbCostUsd, 0),
vectorSearch: costs.reduce((sum, c) => sum + c.vectorSearchCostUsd, 0)
},
daily: costs
};
}
Transparent per-tenant costs prevent disputes and enable accurate billing.
Per-Tenant Rate Limits
Token bucket limits, not shared across tenants.
class TenantRateLimiter {
async checkLimit(tenantId: string, tokens: number) {
const limit = await this.getLimitForTenant(tenantId);
const bucket = await redis.get(`bucket:${tenantId}`);
const now = Date.now();
if (!bucket) {
bucket = {
capacity: limit.dailyTokens,
lastRefill: now
};
}
// Refill daily
const daysSinceRefill = (now - bucket.lastRefill) / (1000 * 60 * 60 * 24);
if (daysSinceRefill > 1) {
bucket.capacity = limit.dailyTokens;
bucket.lastRefill = now;
}
if (bucket.capacity < tokens) {
return {
allowed: false,
retryAfter: Math.ceil((tokens - bucket.capacity) / (limit.dailyTokens / 24)),
remaining: bucket.capacity
};
}
bucket.capacity -= tokens;
await redis.set(`bucket:${tenantId}`, bucket);
return { allowed: true, remaining: bucket.capacity };
}
async getLimitForTenant(tenantId: string) {
const tenant = await db.tenants.findOne({ id: tenantId });
const limits = {
free: { dailyTokens: 50000, tokensPerMinute: 1000 },
pro: { dailyTokens: 500000, tokensPerMinute: 10000 },
enterprise: { dailyTokens: 5000000, tokensPerMinute: 100000 }
};
return limits[tenant.tier];
}
}
// Middleware
app.use(async (req, res, next) => {
const limiter = new TenantRateLimiter();
const estimatedTokens = estimateTokens(req.body.prompt);
const check = await limiter.checkLimit(req.user.tenantId, estimatedTokens);
if (!check.allowed) {
return res.status(429).json({
error: 'Rate limit exceeded',
remaining: check.remaining,
retryAfter: check.retryAfter
});
}
next();
});
Each tenant gets their own token bucket. Limits scale with tier.
Tenant-Specific Fine-Tuned Models
Premium tenants can have fine-tuned models trained on their data.
async function trainTenantModel(tenantId: string) {
const tenant = await db.tenants.findOne({ id: tenantId });
if (tenant.tier !== 'enterprise') {
throw new Error('Fine-tuned models only for enterprise tier');
}
// Collect training data from tenant's conversations
const trainingData = await db.conversations.find({ tenantId })
.select(['prompt', 'response', 'rating'])
.limit(10000);
// Format for OpenAI fine-tuning
const formattedData = trainingData.map(d => ({
messages: [
{ role: 'user', content: d.prompt },
{ role: 'assistant', content: d.response }
]
}));
// Upload and fine-tune
const file = await openai.files.create({
file: Buffer.from(JSON.stringify(formattedData)),
purpose: 'fine-tune'
});
const fineTune = await openai.fineTuning.jobs.create({
training_file: file.id,
model: 'gpt-4o-mini',
suffix: `tenant_${tenantId}`
});
// Track fine-tuned model
await db.tenants.updateOne(
{ id: tenantId },
{ fineTunedModel: fineTune.id, fineTuneStatus: 'training' }
);
// Return fine-tuned model when ready
return fineTune;
}
// Use fine-tuned model for tenant
async function generateResponse(tenantId: string, prompt: string) {
const tenant = await db.tenants.findOne({ id: tenantId });
let model = 'gpt-4o';
if (tenant.fineTunedModel && tenant.fineTuneStatus === 'succeeded') {
model = tenant.fineTunedModel;
}
const response = await openai.createChatCompletion({
model,
messages: [{ role: 'user', content: prompt }]
});
return response.choices[0].message.content;
}
Fine-tuned models learn tenant-specific language and decision patterns. Enterprise customers benefit from models trained on their exact use case.
Preventing Cross-Tenant Data Leakage in RAG
The most dangerous multi-tenancy bug: one tenant's context leaking to another.
// Anti-pattern: queries without tenant filter
const BAD_searchVectors = async (query: string) => {
const embedding = await model.embed(query);
return await vectorDb.search({ vector: embedding, topK: 5 });
// BUG: returns documents from ALL tenants
};
// Pattern: always filter
const GOOD_searchVectors = async (tenantId: string, query: string) => {
const embedding = await model.embed(query);
return await vectorDb.search({
vector: embedding,
topK: 5,
filter: { tenantId } // ← Required
});
};
// Test for data leakage
async function testTenantIsolation() {
const tenant1Id = 'tenant_1';
const tenant2Id = 'tenant_2';
// Add document to tenant 1
await indexDocument(tenant1Id, 'Tenant 1 secret data');
// Search as tenant 2
const results = await searchVectors(tenant2Id, 'secret');
// Should be empty
if (results.length > 0) {
throw new Error('DATA LEAKAGE: Tenant 2 saw Tenant 1''s data');
}
}
Run leakage tests in CI/CD. Data leakage is the worst kind of security bug—silent and destructive.
Per-Tenant Conversation History Isolation
Conversations are private to each tenant.
// Store conversation with tenant tag
async function createConversation(tenantId: string, userId: string) {
return await db.conversations.insertOne({
id: uuidv4(),
tenantId, // Critical: tag for isolation
userId,
messages: [],
createdAt: new Date(),
metadata: {}
});
}
// Retrieve: filter by tenant AND user
async function getConversation(tenantId: string, userId: string, conversationId: string) {
const conversation = await db.conversations.findOne({
id: conversationId,
tenantId, // Verify tenant ownership
userId // Verify user ownership
});
if (!conversation) {
throw new Error('Conversation not found');
}
return conversation;
}
// Add message: verify tenant + user
async function addMessage(
tenantId: string,
userId: string,
conversationId: string,
role: 'user' | 'assistant',
content: string
) {
const conversation = await db.conversations.findOne({
id: conversationId,
tenantId,
userId
});
if (!conversation) {
throw new Error('Unauthorized');
}
await db.conversations.updateOne(
{ id: conversationId },
{
$push: {
messages: {
role,
content,
timestamp: new Date()
}
}
}
);
}
// List conversations: tenant-scoped
async function listConversations(tenantId: string, userId: string) {
return await db.conversations.find({
tenantId,
userId
}).sort({ createdAt: -1 });
}
Always filter by tenantId AND userId. Never trust client-provided IDs.
Tenant-Aware Caching
Cache is cheap but keyed by tenant.
async function getCachedResponse(tenantId: string, prompt: string) {
// Cache key includes tenant
const cacheKey = `response:${tenantId}:${hash(prompt)}`;
const cached = await redis.get(cacheKey);
if (cached) {
return JSON.parse(cached);
}
return null;
}
async function setCachedResponse(tenantId: string, prompt: string, response: string) {
const cacheKey = `response:${tenantId}:${hash(prompt)}`;
await redis.setex(cacheKey, 3600, JSON.stringify(response));
}
// Use in response generation
async function generateResponse(tenantId: string, prompt: string) {
// Check cache
let response = await getCachedResponse(tenantId, prompt);
if (!response) {
// Generate
response = await openai.createChatCompletion({
model: 'gpt-4o',
messages: [{ role: 'user', content: prompt }]
});
// Cache
await setCachedResponse(tenantId, prompt, response);
}
return response;
}
Tenant-aware cache keys prevent one tenant from getting another tenant's cached response.
Compliance Requirements Per Tenant
Different tenants may have different compliance requirements.
interface TenantCompliance {
tenantId: string;
dataResidency?: 'US' | 'EU' | 'APAC';
encryptionRequired: boolean;
loggingLevel: 'basic' | 'detailed' | 'full';
certifications: ('HIPAA' | 'GDPR' | 'SOC2' | 'PCI-DSS')[];
}
async function validateCompliance(tenantId: string, operation: string) {
const compliance = await db.tenantCompliance.findOne({ tenantId });
// HIPAA: PHI cannot leave the US
if (compliance.certifications.includes('HIPAA')) {
const region = await getRegion();
if (region !== 'US') {
throw new Error('HIPAA tenant data outside US');
}
}
// GDPR: log all data access
if (compliance.certifications.includes('GDPR')) {
await logDataAccess(tenantId, operation);
}
// PCI-DSS: encrypt at rest and in transit
if (compliance.certifications.includes('PCI-DSS')) {
// Use TLS + encryption
}
}
// Route to compliance-aware region
async function executeOperation(tenantId: string, operation: () => Promise<unknown>) {
const compliance = await db.tenantCompliance.findOne({ tenantId });
if (compliance.dataResidency === 'EU') {
// Route through EU region
return await eu_region.execute(operation);
}
if (compliance.dataResidency === 'US') {
return await us_region.execute(operation);
}
return await operation();
}
Compliance varies by tenant. HIPAA requires data residency. GDPR requires audit logs. Encode these requirements in the system.
Checklist
- All vector DB queries filter by tenantId
- Every RAG chunk tagged with tenantId at indexing
- LLM system prompts include tenant context
- Per-tenant model configuration (GPT-4o vs GPT-4o-mini)
- Per-tenant cost tracking and budgets
- Per-tenant rate limits by tokens per day
- Fine-tuned models trained on tenant-specific data
- Conversation history filtered by both tenantId and userId
- Cache keys include tenantId
- Data leakage tests in CI/CD (verify no cross-tenant access)
- Compliance requirements encoded per tenant
Conclusion
Multi-tenant AI systems must isolate data at every layer: vector stores, embeddings, conversation history, caching, and model configuration. One leakage is one too many. Test relentlessly for data isolation. Different tenants have different compliance needs—encode these at the architecture level, not as afterthoughts.