Multi-Tenant AI Architecture — Isolating Data, Costs, and Models Per Customer

Introduction

Multi-tenant SaaS systems must isolate customers completely. One tenant's data cannot leak to another. One tenant's costs cannot subsidize another. One tenant's model cannot see another's context. This post covers patterns for building isolated AI systems at scale.

Data Isolation for AI
Per-Tenant Model Configuration
Per-Tenant Cost Tracking and Billing
Per-Tenant Rate Limits
Tenant-Specific Fine-Tuned Models
Preventing Cross-Tenant Data Leakage in RAG
Per-Tenant Conversation History Isolation
Tenant-Aware Caching
Compliance Requirements Per Tenant
Checklist
Conclusion

Data Isolation for AI

Traditional multi-tenant isolation (row-level security in databases) isn't enough for AI. Vector databases, embedding models, and prompt context all need isolation.

// Vector DB: always filter by tenant
async function searchVectors(userId: string, tenantId: string, query: string) {
  const embedding = await model.embed(query);

  const results = await vectorDb.search({
    vector: embedding,
    topK: 5,
    filter: {
      tenantId: tenantId // CRITICAL: filter by tenant
    }
  });

  return results;
}

// RAG context: tag all documents with tenant
async function indexDocument(tenantId: string, document: string) {
  const chunks = document.split('\n\n');

  for (const chunk of chunks) {
    const embedding = await model.embed(chunk);

    await vectorDb.insert({
      embedding,
      content: chunk,
      tenantId, // Tag every chunk
      documentId: uuidv4(),
      metadata: { createdAt: new Date() }
    });
  }
}

// LLM context: include tenant
async function generateResponse(userId: string, tenantId: string, prompt: string) {
  // Get tenant-scoped RAG context
  const context = await searchVectors(userId, tenantId, prompt);

  const messages = [
    {
      role: 'system',
      content: `You are an AI assistant for ${tenantId}.
Use ONLY the following documents that belong to this tenant:

${context.map(r =&gt; r.content).join('\n---\n')}`
    },
    {
      role: 'user',
      content: prompt
    }
  ];

  const response = await openai.createChatCompletion({
    model: 'gpt-4o',
    messages
  });

  return response.choices[0].message.content;
}

Tenant isolation in vector searches is critical. Without filters, a user could search and receive context from other tenants. Embedding models don't understand tenancy—you must enforce it at the retrieval layer.

Per-Tenant Model Configuration

Different tiers might use different models. Enterprise customers get GPT-4o; free tier gets GPT-4o-mini.

async function getModelForTenant(tenantId: string) {
  const tenant = await db.tenants.findOne({ id: tenantId });

  const modelConfig = {
    basic: {
      model: 'gpt-4o-mini',
      maxTokens: 500,
      temperature: 0.7
    },
    pro: {
      model: 'gpt-4o',
      maxTokens: 2000,
      temperature: 0.7
    },
    enterprise: {
      model: 'gpt-4o', // Or custom fine-tuned model
      maxTokens: 4000,
      temperature: 0.3, // Lower temp for consistency
      customSystemPrompt: tenant.customSystemPrompt // Tenant-specific instructions
    }
  };

  return modelConfig[tenant.tier];
}

// Use per-tenant model
async function generateResponse(tenantId: string, prompt: string) {
  const config = await getModelForTenant(tenantId);

  const response = await openai.createChatCompletion({
    model: config.model,
    max_tokens: config.maxTokens,
    temperature: config.temperature,
    system: config.customSystemPrompt || 'You are a helpful assistant.'
  });

  return response.choices[0].message.content;
}

Per-tenant models enable tiering: free customers get cheaper models, paying customers get better ones.

Per-Tenant Cost Tracking and Billing

Every tenant must know their usage. Don't pool costs.

interface TenantCostRecord {
  tenantId: string;
  date: Date;
  llmTokens: number;
  llmCostUsd: number;
  dbQueryCount: number;
  dbCostUsd: number;
  vectorSearchCount: number;
  vectorSearchCostUsd: number;
  totalCostUsd: number;
}

async function recordCost(
  tenantId: string,
  tokens: number,
  costUsd: number,
  operation: string
) {
  const today = new Date().toISOString().split('T')[0];

  await db.tenantCosts.updateOne(
    { tenantId, date: today },
    {
      $inc: {
        llmTokens: tokens,
        llmCostUsd: costUsd,
        totalCostUsd: costUsd
      },
      $push: {
        operations: { type: operation, cost: costUsd, timestamp: new Date() }
      }
    },
    { upsert: true }
  );

  // Check if tenant has exceeded budget
  const monthlyCost = await db.tenantCosts.sumByMonth(tenantId, currentMonth());

  if (monthlyCost &gt; tenantLimits[tenant.tier].monthlyLimit) {
    await disableAIFeatures(tenantId);
    await sendAlert(tenantId, `Monthly cost limit exceeded: $${monthlyCost}`);
  }
}

// Tenant billing dashboard
async function getTenantCosts(tenantId: string, period: string) {
  const costs = await db.tenantCosts.find({
    tenantId,
    date: { $gte: startOfPeriod(period) }
  });

  return {
    total: costs.reduce((sum, c) =&gt; sum + c.totalCostUsd, 0),
    breakdown: {
      llm: costs.reduce((sum, c) =&gt; sum + c.llmCostUsd, 0),
      database: costs.reduce((sum, c) =&gt; sum + c.dbCostUsd, 0),
      vectorSearch: costs.reduce((sum, c) =&gt; sum + c.vectorSearchCostUsd, 0)
    },
    daily: costs
  };
}

Transparent per-tenant costs prevent disputes and enable accurate billing.

Per-Tenant Rate Limits

Token bucket limits, not shared across tenants.

class TenantRateLimiter {
  async checkLimit(tenantId: string, tokens: number) {
    const limit = await this.getLimitForTenant(tenantId);
    const bucket = await redis.get(`bucket:${tenantId}`);
    const now = Date.now();

    if (!bucket) {
      bucket = {
        capacity: limit.dailyTokens,
        lastRefill: now
      };
    }

    // Refill daily
    const daysSinceRefill = (now - bucket.lastRefill) / (1000 * 60 * 60 * 24);
    if (daysSinceRefill &gt; 1) {
      bucket.capacity = limit.dailyTokens;
      bucket.lastRefill = now;
    }

    if (bucket.capacity &lt; tokens) {
      return {
        allowed: false,
        retryAfter: Math.ceil((tokens - bucket.capacity) / (limit.dailyTokens / 24)),
        remaining: bucket.capacity
      };
    }

    bucket.capacity -= tokens;
    await redis.set(`bucket:${tenantId}`, bucket);

    return { allowed: true, remaining: bucket.capacity };
  }

  async getLimitForTenant(tenantId: string) {
    const tenant = await db.tenants.findOne({ id: tenantId });

    const limits = {
      free: { dailyTokens: 50000, tokensPerMinute: 1000 },
      pro: { dailyTokens: 500000, tokensPerMinute: 10000 },
      enterprise: { dailyTokens: 5000000, tokensPerMinute: 100000 }
    };

    return limits[tenant.tier];
  }
}

// Middleware
app.use(async (req, res, next) =&gt; {
  const limiter = new TenantRateLimiter();
  const estimatedTokens = estimateTokens(req.body.prompt);
  const check = await limiter.checkLimit(req.user.tenantId, estimatedTokens);

  if (!check.allowed) {
    return res.status(429).json({
      error: 'Rate limit exceeded',
      remaining: check.remaining,
      retryAfter: check.retryAfter
    });
  }

  next();
});

Each tenant gets their own token bucket. Limits scale with tier.

Tenant-Specific Fine-Tuned Models

Premium tenants can have fine-tuned models trained on their data.

async function trainTenantModel(tenantId: string) {
  const tenant = await db.tenants.findOne({ id: tenantId });

  if (tenant.tier !== 'enterprise') {
    throw new Error('Fine-tuned models only for enterprise tier');
  }

  // Collect training data from tenant's conversations
  const trainingData = await db.conversations.find({ tenantId })
    .select(['prompt', 'response', 'rating'])
    .limit(10000);

  // Format for OpenAI fine-tuning
  const formattedData = trainingData.map(d =&gt; ({
    messages: [
      { role: 'user', content: d.prompt },
      { role: 'assistant', content: d.response }
    ]
  }));

  // Upload and fine-tune
  const file = await openai.files.create({
    file: Buffer.from(JSON.stringify(formattedData)),
    purpose: 'fine-tune'
  });

  const fineTune = await openai.fineTuning.jobs.create({
    training_file: file.id,
    model: 'gpt-4o-mini',
    suffix: `tenant_${tenantId}`
  });

  // Track fine-tuned model
  await db.tenants.updateOne(
    { id: tenantId },
    { fineTunedModel: fineTune.id, fineTuneStatus: 'training' }
  );

  // Return fine-tuned model when ready
  return fineTune;
}

// Use fine-tuned model for tenant
async function generateResponse(tenantId: string, prompt: string) {
  const tenant = await db.tenants.findOne({ id: tenantId });

  let model = 'gpt-4o';
  if (tenant.fineTunedModel &amp;&amp; tenant.fineTuneStatus === 'succeeded') {
    model = tenant.fineTunedModel;
  }

  const response = await openai.createChatCompletion({
    model,
    messages: [{ role: 'user', content: prompt }]
  });

  return response.choices[0].message.content;
}

Fine-tuned models learn tenant-specific language and decision patterns. Enterprise customers benefit from models trained on their exact use case.

Preventing Cross-Tenant Data Leakage in RAG

The most dangerous multi-tenancy bug: one tenant's context leaking to another.

// Anti-pattern: queries without tenant filter
const BAD_searchVectors = async (query: string) =&gt; {
  const embedding = await model.embed(query);
  return await vectorDb.search({ vector: embedding, topK: 5 });
  // BUG: returns documents from ALL tenants
};

// Pattern: always filter
const GOOD_searchVectors = async (tenantId: string, query: string) =&gt; {
  const embedding = await model.embed(query);
  return await vectorDb.search({
    vector: embedding,
    topK: 5,
    filter: { tenantId } // ← Required
  });
};

// Test for data leakage
async function testTenantIsolation() {
  const tenant1Id = 'tenant_1';
  const tenant2Id = 'tenant_2';

  // Add document to tenant 1
  await indexDocument(tenant1Id, 'Tenant 1 secret data');

  // Search as tenant 2
  const results = await searchVectors(tenant2Id, 'secret');

  // Should be empty
  if (results.length &gt; 0) {
    throw new Error('DATA LEAKAGE: Tenant 2 saw Tenant 1''s data');
  }
}

Run leakage tests in CI/CD. Data leakage is the worst kind of security bug—silent and destructive.

Per-Tenant Conversation History Isolation

Conversations are private to each tenant.

// Store conversation with tenant tag
async function createConversation(tenantId: string, userId: string) {
  return await db.conversations.insertOne({
    id: uuidv4(),
    tenantId, // Critical: tag for isolation
    userId,
    messages: [],
    createdAt: new Date(),
    metadata: {}
  });
}

// Retrieve: filter by tenant AND user
async function getConversation(tenantId: string, userId: string, conversationId: string) {
  const conversation = await db.conversations.findOne({
    id: conversationId,
    tenantId, // Verify tenant ownership
    userId // Verify user ownership
  });

  if (!conversation) {
    throw new Error('Conversation not found');
  }

  return conversation;
}

// Add message: verify tenant + user
async function addMessage(
  tenantId: string,
  userId: string,
  conversationId: string,
  role: 'user' | 'assistant',
  content: string
) {
  const conversation = await db.conversations.findOne({
    id: conversationId,
    tenantId,
    userId
  });

  if (!conversation) {
    throw new Error('Unauthorized');
  }

  await db.conversations.updateOne(
    { id: conversationId },
    {
      $push: {
        messages: {
          role,
          content,
          timestamp: new Date()
        }
      }
    }
  );
}

// List conversations: tenant-scoped
async function listConversations(tenantId: string, userId: string) {
  return await db.conversations.find({
    tenantId,
    userId
  }).sort({ createdAt: -1 });
}

Always filter by tenantId AND userId. Never trust client-provided IDs.

Tenant-Aware Caching

Cache is cheap but keyed by tenant.

async function getCachedResponse(tenantId: string, prompt: string) {
  // Cache key includes tenant
  const cacheKey = `response:${tenantId}:${hash(prompt)}`;
  const cached = await redis.get(cacheKey);

  if (cached) {
    return JSON.parse(cached);
  }

  return null;
}

async function setCachedResponse(tenantId: string, prompt: string, response: string) {
  const cacheKey = `response:${tenantId}:${hash(prompt)}`;
  await redis.setex(cacheKey, 3600, JSON.stringify(response));
}

// Use in response generation
async function generateResponse(tenantId: string, prompt: string) {
  // Check cache
  let response = await getCachedResponse(tenantId, prompt);

  if (!response) {
    // Generate
    response = await openai.createChatCompletion({
      model: 'gpt-4o',
      messages: [{ role: 'user', content: prompt }]
    });

    // Cache
    await setCachedResponse(tenantId, prompt, response);
  }

  return response;
}

Tenant-aware cache keys prevent one tenant from getting another tenant's cached response.

Compliance Requirements Per Tenant

Different tenants may have different compliance requirements.

interface TenantCompliance {
  tenantId: string;
  dataResidency?: 'US' | 'EU' | 'APAC';
  encryptionRequired: boolean;
  loggingLevel: 'basic' | 'detailed' | 'full';
  certifications: ('HIPAA' | 'GDPR' | 'SOC2' | 'PCI-DSS')[];
}

async function validateCompliance(tenantId: string, operation: string) {
  const compliance = await db.tenantCompliance.findOne({ tenantId });

  // HIPAA: PHI cannot leave the US
  if (compliance.certifications.includes('HIPAA')) {
    const region = await getRegion();
    if (region !== 'US') {
      throw new Error('HIPAA tenant data outside US');
    }
  }

  // GDPR: log all data access
  if (compliance.certifications.includes('GDPR')) {
    await logDataAccess(tenantId, operation);
  }

  // PCI-DSS: encrypt at rest and in transit
  if (compliance.certifications.includes('PCI-DSS')) {
    // Use TLS + encryption
  }
}

// Route to compliance-aware region
async function executeOperation(tenantId: string, operation: () =&gt; Promise&lt;unknown&gt;) {
  const compliance = await db.tenantCompliance.findOne({ tenantId });

  if (compliance.dataResidency === 'EU') {
    // Route through EU region
    return await eu_region.execute(operation);
  }

  if (compliance.dataResidency === 'US') {
    return await us_region.execute(operation);
  }

  return await operation();
}

Compliance varies by tenant. HIPAA requires data residency. GDPR requires audit logs. Encode these requirements in the system.

Checklist

All vector DB queries filter by tenantId
Every RAG chunk tagged with tenantId at indexing
LLM system prompts include tenant context
Per-tenant model configuration (GPT-4o vs GPT-4o-mini)
Per-tenant cost tracking and budgets
Per-tenant rate limits by tokens per day
Fine-tuned models trained on tenant-specific data
Conversation history filtered by both tenantId and userId
Cache keys include tenantId
Data leakage tests in CI/CD (verify no cross-tenant access)
Compliance requirements encoded per tenant

Conclusion

Multi-tenant AI systems must isolate data at every layer: vector stores, embeddings, conversation history, caching, and model configuration. One leakage is one too many. Test relentlessly for data isolation. Different tenants have different compliance needs—encode these at the architecture level, not as afterthoughts.