Published on

The Strangler Fig Pattern — Migrating Legacy Systems Without a Big Bang Rewrite

Authors

Introduction

Rewriting legacy systems is dangerous. Big bang rewrites fail. The Strangler Fig pattern is safer: gradually replace functionality piece by piece, routing new requests to the new system while keeping the old system running. Eventually the old system is completely strangled and can be turned off.

Strangler Fig Concept

In nature, a strangler fig grows around an existing tree, slowly suffocating it until the original is gone. In software, we grow a new system around the legacy one:

// Phase 1: HTTP facade in front of legacy system
// Client requests → Facade → Decides new or legacy
// Initially, everything goes to legacy

// Phase 2: Gradually route functionality to new system
// /api/orders → New system
// /api/users → Still going to legacy
// /api/reports → Legacy
// ... (incrementally migrate features)

// Phase 3: Legacy system can be retired
// Everything routed to new system
// Legacy turned off

// Architecture:
// Old System (2010, monolithic)
//   ↑
//   |
// Facade (HTTP router, decision logic)
//   ↑
//   |
// New System (2026, modular)

HTTP Facade: Routing New Code First

A facade (proxy, API gateway) sits in front and decides where to route each request:

// src/facade/router.ts
import express from 'express'
import { newSystemClient } from './clients/new-system'
import { legacySystemClient } from './clients/legacy-system'

export const router = express.Router()

// Routing rules: New system first, fallback to legacy
router.get('/api/orders/:id', async (req, res) => {
  try {
    // Try new system
    const order = await newSystemClient.getOrder(req.params.id)
    res.json(order)
  } catch (error) {
    // If new system doesn't have it yet, fall back to legacy
    if (error instanceof NotFoundError) {
      const order = await legacySystemClient.getOrder(req.params.id)
      res.json(order)
    } else {
      throw error
    }
  }
})

router.get('/api/users/:id', async (req, res) => {
  // Users not migrated yet, always use legacy
  const user = await legacySystemClient.getUser(req.params.id)
  res.json(user)
})

router.post('/api/orders', async (req, res) => {
  // Only new system accepts new order creation
  const order = await newSystemClient.createOrder(req.body)
  res.status(201).json(order)
})

// Metadata about migration status
interface MigrationStatus {
  feature: string
  status: 'legacy' | 'parallel' | 'shadow' | 'new' | 'retired'
  legacySystem: string
  newSystem: string
  trafficPercentage: number
}

const migrations: MigrationStatus[] = [
  {
    feature: 'GetOrder',
    status: 'new',
    legacySystem: 'monolith:3000',
    newSystem: 'orders-service:3001',
    trafficPercentage: 100
  },
  {
    feature: 'CreateOrder',
    status: 'new',
    legacySystem: 'monolith:3000',
    newSystem: 'orders-service:3001',
    trafficPercentage: 100
  },
  {
    feature: 'GetUser',
    status: 'legacy',
    legacySystem: 'monolith:3000',
    newSystem: 'users-service:3002',
    trafficPercentage: 0
  }
]

router.get('/api/migrations/status', (req, res) => {
  res.json(migrations)
})

Feature Flag Routing

Before fully trusting a migration, use feature flags to gradually shift traffic:

// src/facade/feature-flags.ts
import { flagService } from './flag-service'

export async function routeToService(
  feature: string,
  userId: string,
  legacyHandler: () => Promise<any>,
  newHandler: () => Promise<any>
): Promise<any> {
  // Check what percentage of users should use new system
  const useNewSystem = await flagService.isEnabled(
    `migrate_${feature}`,
    userId  // Users are bucketed consistently
  )

  if (useNewSystem) {
    try {
      return await newHandler()
    } catch (error) {
      // If new system fails, fall back to legacy
      console.error(`New system failed for ${feature}, falling back`, error)
      return await legacyHandler()
    }
  } else {
    return await legacyHandler()
  }
}

// Usage
router.get('/api/orders/:id', async (req, res) => {
  const order = await routeToService(
    'GetOrder',
    req.user.id,
    () => legacySystemClient.getOrder(req.params.id),
    () => newSystemClient.getOrder(req.params.id)
  )
  res.json(order)
})

// Feature flag config in console or database
// migrate_GetOrder: 10% of users → new system, 90% → legacy
// After a week, confident, increase to 50%
// After another week, 100%

Dual-Write for Data Migration

For features you're migrating, write to both old and new system simultaneously:

// src/facade/write-operations.ts

router.post('/api/orders', async (req, res) => {
  // Create in new system
  const newOrder = await newSystemClient.createOrder(req.body)

  // Also write to legacy system for continuity
  // (in case we need to roll back)
  await legacySystemClient.createOrder({
    ...req.body,
    externalId: newOrder.id,  // Link them together
    migratedFrom: 'new-system'
  })

  // Return from new system
  res.status(201).json(newOrder)
})

// Alternative: Write to legacy first, then new (safer for rollback)
router.post('/api/orders', async (req, res) => {
  // Create in legacy (current source of truth)
  const legacyOrder = await legacySystemClient.createOrder(req.body)

  // Also write to new system asynchronously (fire and forget)
  newSystemClient.createOrder({
    ...legacyOrder,
    originatedFrom: 'legacy'
  }).catch(error => {
    // Log but don't fail - legacy system is source of truth
    console.error('Failed to write to new system', error)
  })

  res.status(201).json(legacyOrder)
})

// Data reconciliation: Periodically check both systems match
export async function reconcileOrders(orderId: string): Promise<boolean> {
  const [legacy, newSys] = await Promise.all([
    legacySystemClient.getOrder(orderId),
    newSystemClient.getOrder(orderId)
  ])

  const match =
    legacy.id === newSys.externalId &&
    legacy.total === newSys.total &&
    legacy.status === newSys.status

  if (!match) {
    console.error(`Orders don't match for ${orderId}`)
    // Alert, investigate, fix
  }

  return match
}

Comparison Testing (Shadow Mode)

Send requests to both systems, compare responses without affecting users:

// src/facade/shadow-mode.ts
import { EventEmitter } from 'events'

const comparisonEmitter = new EventEmitter()

interface ComparisonResult {
  feature: string
  userId: string
  request: any
  legacyResponse: any
  newResponse: any
  match: boolean
  differences?: string[]
  timestamp: Date
}

router.get('/api/orders/:id', async (req, res) => {
  // Primary request to legacy (current source of truth)
  const legacyOrder = await legacySystemClient.getOrder(req.params.id)

  // Shadow request to new system (async, don't wait)
  newSystemClient.getOrder(req.params.id)
    .then(newOrder => {
      // Compare responses
      const match =
        legacyOrder.id === newOrder.id &&
        legacyOrder.total === newOrder.total &&
        legacyOrder.status === newOrder.status

      const differences: string[] = []
      if (legacyOrder.total !== newOrder.total) {
        differences.push(`total: ${legacyOrder.total} vs ${newOrder.total}`)
      }
      if (legacyOrder.items.length !== newOrder.items.length) {
        differences.push(`items: ${legacyOrder.items.length} vs ${newOrder.items.length}`)
      }

      // Emit comparison result (goes to analytics/monitoring)
      comparisonEmitter.emit('comparison', {
        feature: 'GetOrder',
        userId: req.user.id,
        request: { orderId: req.params.id },
        legacyResponse: legacyOrder,
        newResponse: newOrder,
        match,
        differences: match ? undefined : differences,
        timestamp: new Date()
      })

      // Log mismatches
      if (!match) {
        console.warn(
          `Shadow mode mismatch for order ${req.params.id}:`,
          differences
        )
      }
    })
    .catch(error => {
      console.error('Shadow request failed (non-critical)', error)
    })

  // Return legacy response (proven to work)
  res.json(legacyOrder)
})

// Dashboard shows comparison results
comparisonEmitter.on('comparison', (result: ComparisonResult) => {
  if (!result.match) {
    // Alert to engineers: new system returns different data
  }
})

Incremental Domain Migration Order

Decide which domain to migrate first. Start with least-coupled features:

// Example: E-commerce system
// Dependency graph:

// User service ← Orders ← Payments ← Shipping ← Fulfillment
//   ↑
//   └── Product catalog ← Inventory

// Migration priority (least to most coupled):
// 1. Product catalog (no dependencies)
// 2. Inventory (depends on catalog)
// 3. User service (needed by orders, but lower domain coupling)
// 4. Orders (depends on users, products, inventory)
// 5. Payments (depends on orders)
// 6. Shipping (depends on orders, payments)
// 7. Fulfillment (depends on shipping)

// Migration timeline:
// Week 1-2: Migrate Product Catalog
//   - Least risky, no dependencies
//   - Prove new system can handle traffic volume
//
// Week 3-4: Migrate Inventory
//   - Depends on catalog (already migrated)
//   - Good test of inter-service communication
//
// Week 5-6: Migrate Users
//   - Core domain, many dependents
//   - Shadow mode before full cutover
//
// Week 7-8: Migrate Orders (read-only first)
//   - Large feature, split into phases
//   - GetOrder → new system first
//   - CreateOrder → dual-write phase
//   - UpdateOrder → last
//
// Week 9-10: Migrate Payments
//   - Critical for revenue, extra testing
//   - Highest risk, most scrutinized
//
// Weeks 11+: Shipping & Fulfillment
//   - By now, infrastructure proven
//   - Less risky

Rollback Strategy at Each Phase

Every migration step must be reversible:

// src/facade/rollback-strategy.ts

interface MigrationPhase {
  name: string
  newSystem: string
  legacySystem: string
  trafficPercentage: number
  rollbackCondition: (metrics: SystemMetrics) => boolean
}

const phases: MigrationPhase[] = [
  {
    name: 'GetOrder - 10% traffic',
    newSystem: 'orders-service',
    legacySystem: 'monolith',
    trafficPercentage: 10,
    // Rollback if error rate exceeds 0.1% or latency > 500ms
    rollbackCondition: (m) => m.errorRate > 0.001 || m.p99Latency > 500
  },
  {
    name: 'GetOrder - 50% traffic',
    newSystem: 'orders-service',
    legacySystem: 'monolith',
    trafficPercentage: 50,
    rollbackCondition: (m) => m.errorRate > 0.01 || m.p99Latency > 300
  },
  {
    name: 'GetOrder - 100% traffic',
    newSystem: 'orders-service',
    legacySystem: 'monolith',
    trafficPercentage: 100,
    rollbackCondition: (m) => m.errorRate > 0.05 || m.p99Latency > 200
  }
]

// Monitoring checks every minute
setInterval(async () => {
  const metrics = await metricsClient.getSystemMetrics()

  for (const phase of phases) {
    if (phase.rollbackCondition(metrics)) {
      console.error(`Rollback triggered for ${phase.name}`)

      // Instantly route back to legacy
      await flagService.disable(`migrate_${phase.name}`)

      // Send alert
      await alerting.sendAlert({
        severity: 'critical',
        message: `Migrated system failed, rolled back to legacy`,
        phase: phase.name,
        metrics
      })
    }
  }
}, 60000)

// Manual rollback
router.post('/api/rollback/:feature', async (req, res) => {
  const { feature } = req.params
  const { reason } = req.body

  console.log(`Manual rollback of ${feature}: ${reason}`)
  await flagService.disable(`migrate_${feature}`)

  res.json({ success: true, message: `Rolled back ${feature}` })
})

Measuring Migration Progress

Track what's been migrated and what remains:

// src/facade/migration-tracker.ts

export interface MigrationMetrics {
  startDate: Date
  completionDate?: Date
  requestCount: number
  newSystemCount: number
  legacySystemCount: number
  newSystemPercentage: number
  features: FeatureMigration[]
}

export interface FeatureMigration {
  feature: string
  status: 'legacy' | 'in_progress' | 'complete'
  startDate: Date
  completionDate?: Date
  trafficPercentage: number
  requestsHandled: number
  errors: number
}

const migrationTracker = new Map<string, MigrationMetrics>()

router.use((req, res, next) => {
  const originalSend = res.send
  let usedNewSystem = false

  res.send = function (data: any) {
    // Determine if this response came from new or legacy
    const responseHeader = res.getHeader('X-System') as string
    if (responseHeader === 'new-system') {
      usedNewSystem = true
    }

    // Track metrics
    const feature = req.path.split('/')[2]  // e.g., 'orders', 'users'
    if (!migrationTracker.has(feature)) {
      migrationTracker.set(feature, {
        startDate: new Date(),
        requestCount: 0,
        newSystemCount: 0,
        legacySystemCount: 0,
        newSystemPercentage: 0,
        features: []
      })
    }

    const metrics = migrationTracker.get(feature)!
    metrics.requestCount++
    if (usedNewSystem) {
      metrics.newSystemCount++
    } else {
      metrics.legacySystemCount++
    }
    metrics.newSystemPercentage = (metrics.newSystemCount / metrics.requestCount) * 100

    return originalSend.call(this, data)
  }

  next()
})

router.get('/api/migrations/progress', (req, res) => {
  const progress = Array.from(migrationTracker.values())
  res.json(progress)
})

// Example output:
// {
//   "startDate": "2025-12-01T00:00:00Z",
//   "requestCount": 1000000,
//   "newSystemCount": 900000,
//   "legacySystemCount": 100000,
//   "newSystemPercentage": 90,
//   "features": [
//     {
//       "feature": "GetOrder",
//       "status": "complete",
//       "startDate": "2025-12-01T00:00:00Z",
//       "completionDate": "2026-01-15T00:00:00Z",
//       "trafficPercentage": 100,
//       "requestsHandled": 500000,
//       "errors": 2
//     }
//   ]
// }

Decommissioning the Legacy System

Once everything is migrated, sunset the old system:

// Retirement checklist:
// 1. All traffic routed to new system
// 2. All data migrated and validated
// 3. All fallback routes removed from facade
// 4. Legacy system can be powered off
// 5. Monitor for issues for 2 weeks
// 6. Archive databases, turn off servers

// Timeline:
// Week 1: Stop accepting new connections to legacy
legacySystemClient.deprecate()

// Week 2: Monitor error rates in new system
// If spikes: (would be too late, should have caught earlier with shadow mode)

// Week 3: Power off legacy
const legacyServer = spawn('docker', ['stop', 'legacy-system'])

// Week 4: Archive data for compliance/audit
async function archiveLegacyData() {
  const databases = ['orders', 'users', 'products']

  for (const db of databases) {
    // Dump database to cold storage
    await s3.putObject({
      Bucket: 'legacy-system-archive',
      Key: `${db}-${Date.now()}.sql.gz`,
      Body: await dumpDatabase(db)
    })

    // Keep for 7 years for audit
  }
}

Checklist

  • Facade (proxy/router) in place routing requests
  • Feature flags control traffic percentage to new system
  • Dual-write implemented for data consistency
  • Shadow mode running to catch differences
  • Comparison results monitored for discrepancies
  • Migration order planned: low-coupling features first
  • Rollback strategy defined with automatic triggers
  • Metrics collected: which system handling what percentage
  • Error budgets respected: new system reliability proven before full cutover
  • Legacy system deprecation plan documented

Conclusion

The Strangler Fig pattern is powerful because it's low-risk. You never bet everything on the new system. You prove it works piece by piece, incrementally replacing the old while having a fallback at every step. Most legacy migrations fail because they try to do everything at once. This pattern makes big rewrites safe.