Published on

AI Feature Flags — Safely Rolling Out LLM Features to Production Users

Authors
  • Name
    Twitter

Introduction

Deploying AI features to production carries unique risks. Unlike traditional features, LLM behavior is probabilistic—the same prompt can generate different outputs. Feature flags provide the control layer needed to safely introduce AI capabilities to production users while maintaining the ability to quickly disable or rollback if issues arise.

This post covers battle-tested patterns for feature-flagging AI features, from percentage-based rollouts to shadow mode deployments.

Feature Flags for Model Version Switching

Feature flags enable you to switch between model versions without deploying new code. This is critical when testing GPT-4 against GPT-3.5 or when rolling out a fine-tuned model.

import Anthropic from "@anthropic-ai/sdk";

interface FeatureFlagConfig {
  modelVersion: "gpt-3.5-turbo" | "gpt-4" | "fine-tuned-v1";
  enabled: boolean;
  rolloutPercentage: number;
}

const client = new Anthropic();

async function getModelForUser(userId: string): Promise<string> {
  const flags = await fetchFeatureFlags(userId);

  // Check rollout percentage
  if (Math.random() * 100 > flags.rolloutPercentage) {
    return "claude-3-5-sonnet-20241022"; // Fallback model
  }

  return flags.modelVersion;
}

async function generateAiResponse(
  userId: string,
  userMessage: string
): Promise<string> {
  const model = await getModelForUser(userId);

  const message = await client.messages.create({
    model: model,
    max_tokens: 1024,
    messages: [{ role: "user", content: userMessage }],
  });

  return message.content[0].type === "text" ? message.content[0].text : "";
}

async function fetchFeatureFlags(userId: string): Promise<FeatureFlagConfig> {
  // Fetch from your feature flag service (LaunchDarkly, Unleash, etc.)
  return {
    modelVersion: "gpt-4",
    enabled: true,
    rolloutPercentage: 50,
  };
}

Percentage-Based Rollout Strategy

Rolling out to a percentage of users lets you catch issues before full deployment. Use deterministic hashing to ensure consistent user experience.

interface RolloutConfig {
  featureName: string;
  percentageEnabled: number; // 0-100
}

function shouldEnableForUser(
  userId: string,
  config: RolloutConfig
): boolean {
  // Use consistent hash to ensure same user always gets same experience
  const hash = hashUserId(userId, config.featureName);
  return (hash % 100) &lt; config.percentageEnabled;
}

function hashUserId(userId: string, featureName: string): number {
  // Deterministic hash function
  const combined = `${userId}:${featureName}`;
  let hash = 0;

  for (let i = 0; i &lt; combined.length; i++) {
    const char = combined.charCodeAt(i);
    hash = (hash &lt;&lt; 5) - hash + char;
    hash = hash &amp; hash; // Convert to 32-bit integer
  }

  return Math.abs(hash);
}

// Usage in request handler
app.post("/api/generate", async (req, res) =&gt; {
  const { userId, prompt } = req.body;
  const rolloutConfig: RolloutConfig = {
    featureName: "ai-response-generation",
    percentageEnabled: 25, // 25% of users
  };

  if (!shouldEnableForUser(userId, rolloutConfig)) {
    return res.json({ response: "Using fallback response" });
  }

  const response = await generateAiResponse(userId, prompt);
  res.json({ response });
});

User Segment Targeting

Distribute AI features first to beta users, power users, or specific regions. Create segment-based feature flags.

interface UserSegment {
  isBetaUser: boolean;
  tier: "free" | "pro" | "enterprise";
  region: string;
  joinedDaysAgo: number;
}

interface SegmentedRolloutConfig {
  betaUsersOnly: boolean;
  minTier: "free" | "pro" | "enterprise";
  allowedRegions: string[];
  minAccountAgeDays: number;
}

function isUserInSegment(
  user: UserSegment,
  config: SegmentedRolloutConfig
): boolean {
  if (config.betaUsersOnly &amp;&amp; !user.isBetaUser) {
    return false;
  }

  const tierRanking = { free: 0, pro: 1, enterprise: 2 };
  const minTierRanking = tierRanking[config.minTier];
  if (tierRanking[user.tier] &lt; minTierRanking) {
    return false;
  }

  if (!config.allowedRegions.includes(user.region)) {
    return false;
  }

  return user.joinedDaysAgo &gt;= config.minAccountAgeDays;
}

Kill Switch for AI Features

Implement an immediate kill switch that disables AI features without redeployment. Essential for security incidents or cascading failures.

interface FeatureFlagStore {
  getKillSwitch(featureName: string): Promise&lt;boolean&gt;;
  setKillSwitch(featureName: string, enabled: boolean): Promise&lt;void&gt;;
}

class CachedFeatureFlagStore implements FeatureFlagStore {
  private cache = new Map&lt;string, { value: boolean; expiresAt: number }&gt;();
  private cacheTTL = 5000; // 5 seconds

  async getKillSwitch(featureName: string): Promise&lt;boolean&gt; {
    const cached = this.cache.get(featureName);

    if (cached &amp;&amp; cached.expiresAt &gt; Date.now()) {
      return cached.value;
    }

    // Fetch from persistent store
    const value = await this.fetchFromStore(featureName);
    this.cache.set(featureName, {
      value,
      expiresAt: Date.now() + this.cacheTTL,
    });

    return value;
  }

  async setKillSwitch(
    featureName: string,
    enabled: boolean
  ): Promise&lt;void&gt; {
    await this.updateStore(featureName, enabled);
    this.cache.delete(featureName); // Invalidate cache
  }

  private async fetchFromStore(featureName: string): Promise&lt;boolean&gt; {
    // Implementation depends on your store (Redis, database, etc.)
    return true;
  }

  private async updateStore(
    featureName: string,
    enabled: boolean
  ): Promise&lt;void&gt; {
    // Implementation
  }
}

Shadow Mode Deployment

Run AI in the background without showing results to users. Collect metrics on quality and performance before enabling.

interface ShadowModeRequest {
  userId: string;
  prompt: string;
  shadowOnly: boolean; // Don't use result in response
}

async function generateWithShadowMode(
  req: ShadowModeRequest
): Promise&lt;string&gt; {
  const startTime = Date.now();

  try {
    const aiResponse = await generateAiResponse(req.userId, req.prompt);
    const latency = Date.now() - startTime;

    // Always log shadow responses for analysis
    await logShadowMetric({
      userId: req.userId,
      prompt: req.prompt,
      response: aiResponse,
      latency,
      timestamp: new Date(),
      status: "success",
    });

    // If shadow mode, don't return the AI response
    if (req.shadowOnly) {
      return "Using fallback response (shadow mode)";
    }

    return aiResponse;
  } catch (error) {
    const latency = Date.now() - startTime;

    await logShadowMetric({
      userId: req.userId,
      prompt: req.prompt,
      response: null,
      latency,
      timestamp: new Date(),
      status: "error",
      error: String(error),
    });

    return "Using fallback response (error in shadow)";
  }
}

A/B Testing AI vs Rule-Based Fallback

Compare AI-powered responses against rule-based alternatives to measure business impact.

type TreatmentVariant = "ai" | "rule-based" | "control";

interface ABTestConfig {
  testId: string;
  aiPercentage: number;
  ruleBasedPercentage: number;
  controlPercentage: number;
}

function assignVariant(
  userId: string,
  testConfig: ABTestConfig
): TreatmentVariant {
  const hash = hashUserId(userId, testConfig.testId) % 100;

  if (hash &lt; testConfig.aiPercentage) return "ai";
  if (hash &lt; testConfig.aiPercentage + testConfig.ruleBasedPercentage)
    return "rule-based";
  return "control";
}

async function handleUserRequest(
  userId: string,
  query: string
): Promise&lt;{ response: string; variant: TreatmentVariant }&gt; {
  const variant = assignVariant(userId, {
    testId: "ai-response-test",
    aiPercentage: 34,
    ruleBasedPercentage: 33,
    controlPercentage: 33,
  });

  let response: string;

  if (variant === "ai") {
    response = await generateAiResponse(userId, query);
  } else if (variant === "rule-based") {
    response = generateRuleBasedResponse(query);
  } else {
    response = getDefaultResponse(query);
  }

  // Track variant for analysis
  await logEvent({
    userId,
    variant,
    query,
    response,
    timestamp: new Date(),
  });

  return { response, variant };
}

function generateRuleBasedResponse(query: string): string {
  // Implement rule-based logic
  return "Rule-based response";
}

function getDefaultResponse(query: string): string {
  return "Default response";
}

Monitoring AI Feature Adoption

Track adoption metrics to understand rollout success and user engagement with AI features.

interface AdoptionMetrics {
  featureName: string;
  totalRequests: number;
  requestsWithAI: number;
  adoptionPercentage: number;
  averageLatency: number;
  errorRate: number;
  userCount: number;
  uniqueUsersUsingAI: number;
}

class AdoptionMonitor {
  private metrics = new Map&lt;string, AdoptionMetrics&gt;();

  recordRequest(
    featureName: string,
    usedAI: boolean,
    latency: number,
    error?: boolean
  ): void {
    const key = featureName;
    const current = this.metrics.get(key) || this.initializeMetrics(featureName);

    current.totalRequests++;
    if (usedAI) current.requestsWithAI++;
    current.averageLatency =
      (current.averageLatency * (current.totalRequests - 1) + latency) /
      current.totalRequests;
    if (error) current.errorRate = (current.errorRate * current.totalRequests + 1) / (current.totalRequests + 1);

    current.adoptionPercentage =
      (current.requestsWithAI / current.totalRequests) * 100;

    this.metrics.set(key, current);
  }

  getMetrics(featureName: string): AdoptionMetrics | null {
    return this.metrics.get(featureName) || null;
  }

  private initializeMetrics(featureName: string): AdoptionMetrics {
    return {
      featureName,
      totalRequests: 0,
      requestsWithAI: 0,
      adoptionPercentage: 0,
      averageLatency: 0,
      errorRate: 0,
      userCount: 0,
      uniqueUsersUsingAI: 0,
    };
  }
}

Rollback Procedures

Design rollback procedures to quickly disable AI features if issues arise.

interface RollbackPlan {
  featureName: string;
  rollbackSteps: RollbackStep[];
  estimatedTime: number; // in seconds
}

interface RollbackStep {
  action: "disable-flag" | "activate-fallback" | "notify-team";
  target: string;
  priority: number;
}

class RollbackCoordinator {
  async executeRollback(
    featureName: string,
    reason: string
  ): Promise&lt;void&gt; {
    console.log(`Starting rollback for ${featureName}: ${reason}`);

    // Step 1: Disable feature flag immediately
    await this.disableFeatureFlag(featureName);

    // Step 2: Drain in-flight requests
    await this.waitForInFlightRequests(5000); // 5 second timeout

    // Step 3: Activate fallback
    await this.activateFallback(featureName);

    // Step 4: Notify team
    await this.notifyTeam({
      featureName,
      reason,
      timestamp: new Date(),
      status: "rolled-back",
    });
  }

  private async disableFeatureFlag(featureName: string): Promise&lt;void&gt; {
    // Implementation
  }

  private async waitForInFlightRequests(timeout: number): Promise&lt;void&gt; {
    // Implementation
  }

  private async activateFallback(featureName: string): Promise&lt;void&gt; {
    // Implementation
  }

  private async notifyTeam(notification: object): Promise&lt;void&gt; {
    // Implementation
  }
}

Checklist

  • Implement deterministic hashing for consistent user treatment
  • Add kill switch caching with short TTL (5-10 seconds)
  • Monitor adoption metrics across all AI features
  • Create segment-based targeting for beta users first
  • Design rollback plan before shipping AI features
  • Set up shadow mode for confidence validation
  • Implement A/B test infrastructure for business metrics
  • Alert on error rates exceeding SLO thresholds

Conclusion

Feature flags transform AI deployment from a binary on/off decision into a sophisticated rollout strategy. By combining percentage-based rollouts, user segmentation, shadow mode, and kill switches, you can safely introduce AI capabilities while maintaining the ability to quickly respond to issues. Start with shadow mode, move to beta users, then expand to percentages—this graduated approach minimizes risk while gathering production data to validate quality.