Refactoring Without Breaking Everything — The Incremental Path Through Legacy Code

Introduction

Dangerous refactors share a common pattern: the scope expands during execution, tests are absent or inadequate, the diff grows until nobody can review it effectively, and the "refactor" becomes a rewrite in progress. Safe refactors share a different pattern: write tests for current behavior before changing anything, extract the smallest useful unit, verify nothing broke, commit, and repeat. The strangler fig pattern describes this at the architectural level — grow the new system alongside the old one, migrate traffic gradually, remove the old system only after the new one is proven.

Why Refactors Go Wrong
Fix 1: Characterization Tests — Capture Behavior Before Changing It
Fix 2: The Strangler Fig Pattern for Large Refactors
Fix 3: The Small Step Method
Fix 4: Safe Database Schema Refactoring
Fix 5: Refactor PR Checklist
Safe Refactoring Checklist
Conclusion

Why Refactors Go Wrong

Refactor failure patterns:

1. The expanding scope refactor
   → "While I'm in here, I should also fix..."
   → PR grows from 50 to 500 lines
   → Can't be reviewed, can't be merged safely
   → Usually abandoned or merged with risk

2. The refactor without tests
   → Behavior changes are invisible
   → Works on the happy path, breaks edge cases
   → Edge cases discovered in production

3. The big bang refactor
   → Rewrites entire module in one PR
   → Hard to review, hard to debug
   → Any rollback reverts everything

4. The rename everything refactor
   → New names throughout the codebase simultaneously
   → Every file touched = merge conflicts everywhere
   → Team productivity drops for a week

Safe refactoring principles:
- Tests first: write tests for current behavior before changing anything
- Small steps: each PR changes one thing
- One concern at a time: rename OR restructure OR move, never all at once
- Green at every commit: each small step passes all tests
- No new features in refactor PRs

Fix 1: Characterization Tests — Capture Behavior Before Changing It

// Before touching legacy code: write tests that capture what it currently does
// These tests don't need to be pretty — they need to be accurate

// Legacy function — nobody knows what it does exactly:
function processOrder(order: any): any {
  if (order.type === 'rush' && order.items.length > 0) {
    const total = order.items.reduce((sum: number, item: any) =>
      sum + (item.price * item.qty * (item.discount || 1)), 0
    )
    return { total: total * 1.25, status: 'rush', fee: total * 0.25 }
  }
  // ... 200 more lines
}

// Characterization tests: capture current behavior (right or wrong)
describe('processOrder - characterization', () => {
  test('rush order with discount: current behavior', () => {
    const result = processOrder({
      type: 'rush',
      items: [{ price: 100, qty: 2, discount: 0.9 }]
    })
    // Run this first, observe output, capture it as the test expectation
    expect(result).toEqual({ total: 225, status: 'rush', fee: 45 })
    // Note: we're not saying this is CORRECT — we're saying this is what it DOES
    // The refactor preserves this behavior; fixing bugs is a separate PR
  })
})

Fix 2: The Strangler Fig Pattern for Large Refactors

// Strangler fig: build new system alongside old one
// Route increasing percentage of traffic to new system
// Remove old system when new is proven

// Example: refactoring the order processing service

// Step 1: Create new implementation alongside old one
// Old: OrderProcessor (legacy, untouchable)
// New: OrderProcessorV2 (clean, tested)

class OrderProcessorV2 {
  async process(order: Order): Promise<OrderResult> {
    // New implementation — clean, tested, observable
    const pricing = await this.pricingService.calculate(order)
    const validation = await this.validator.validate(order)

    if (!validation.isValid) {
      throw new ValidationError(validation.errors)
    }

    return this.fulfillmentService.fulfill(order, pricing)
  }
}

// Step 2: Route traffic gradually using feature flags
async function processOrder(order: Order): Promise<OrderResult> {
  const useV2 = await featureFlags.get('order-processor-v2', {
    rolloutPercentage: 5,  // Start with 5% of orders
    userId: order.customerId,
  })

  if (useV2) {
    try {
      return await orderProcessorV2.process(order)
    } catch (err) {
      // Shadow mode: fall back to V1 on V2 failure
      logger.error({ orderId: order.id, error: err.message }, 'V2 failed, using V1 fallback')
      metrics.increment('order_processor.v2_fallback')
      return await orderProcessorV1.process(order)
    }
  }

  return await orderProcessorV1.process(order)
}

// Step 3: Increase rollout as confidence grows
// 5% → 25% → 50% → 100%
// Remove V1 code only after 100% and stable for 2 weeks

Fix 3: The Small Step Method

// Each refactor PR changes exactly one thing
// No PR should be un-reviewable

// Bad: one PR that renames, restructures, and adds types
// ❌ 500-line PR mixing concerns

// ✅ PR 1: Add TypeScript types (no behavior change)
// Before:
function calculateTotal(items, discount) {
  return items.reduce((sum, item) => sum + item.price, 0) * discount
}

// After PR 1 (types only, same logic):
function calculateTotal(items: OrderItem[], discount: number): number {
  return items.reduce((sum, item) => sum + item.price, 0) * discount
}

// ✅ PR 2: Extract pricing logic (behavior preserved, structure changes)
// Same logic, new home:
function calculateSubtotal(items: OrderItem[]): number {
  return items.reduce((sum, item) => sum + item.price, 0)
}

function calculateTotal(items: OrderItem[], discount: number): number {
  return calculateSubtotal(items) * discount
}

// ✅ PR 3: Fix the actual bug (discount was multiplicative, should be subtractive)
// Now visible and safe to change because it's isolated:
function applyDiscount(subtotal: number, discountPercent: number): number {
  return subtotal * (1 - discountPercent)  // Bug fix: was * discount, now * (1 - percent)
}

// Three small PRs, each reviewable, each safely deployable.
// Nobody had to understand 500 lines to approve any of them.

Fix 4: Safe Database Schema Refactoring

-- Refactoring a database schema safely
-- Same strangler fig principle: expand first, contract later

-- ❌ Dangerous: rename column in one migration
ALTER TABLE orders RENAME COLUMN user_id TO customer_id;
-- All code breaks immediately

-- ✅ Safe: expand-contract (3-step process)

-- Step 1: Add new column, copy data (existing code unaffected)
ALTER TABLE orders ADD COLUMN customer_id UUID;
UPDATE orders SET customer_id = user_id;

-- Backfill trigger for ongoing writes (keep both columns in sync during transition):
CREATE OR REPLACE FUNCTION sync_customer_id()
RETURNS TRIGGER AS $$
BEGIN
  IF TG_OP = 'INSERT' THEN
    NEW.customer_id = NEW.user_id;
  END IF;
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER sync_customer_id_trigger
BEFORE INSERT ON orders
FOR EACH ROW EXECUTE FUNCTION sync_customer_id();

-- Step 2: Deploy new code that uses customer_id
-- Both old code (user_id) and new code (customer_id) work simultaneously

-- Step 3: Remove old column only after all code uses customer_id
-- (in a future deploy, after testing)
ALTER TABLE orders DROP COLUMN user_id;
DROP TRIGGER sync_customer_id_trigger ON orders;

Fix 5: Refactor PR Checklist

# Refactor PR Template

## What Changed
[ ] Renamed variables/functions (no behavior change)
[ ] Extracted function/class (no behavior change)
[ ] Moved code to new location (no behavior change)
[ ] Added types (no behavior change)
[ ] Restructured module (no behavior change)
[ ] Bug fix (behavior change — needs extra review)

## Tests
[ ] Characterization tests written BEFORE this PR
[ ] All existing tests pass
[ ] No new behavior introduced (if behavior changed: separate PR)

## Scope
[ ] This PR changes ONE thing
[ ] Diff is < 200 lines
[ ] Can be deployed independently
[ ] Can be reverted independently

## I have NOT done the following in this PR:
[ ] Added new features
[ ] Fixed unrelated bugs
[ ] Changed behavior to be "more correct"
[ ] Refactored things I wasn't supposed to touch

If any of the "have NOT done" boxes are checked: split into separate PRs.

Safe Refactoring Checklist

✅ Characterization tests written before any changes
✅ Strangler fig for large-scale refactors — new system alongside old
✅ Each PR changes one concern: rename OR restructure OR move
✅ PRs are small enough to be meaningfully reviewed (< 200 lines)
✅ Database schema changes use expand-contract, not in-place rename
✅ Feature flags enable gradual rollout and fast rollback
✅ No behavior changes in refactor PRs — bug fixes are separate PRs

Conclusion

Refactoring safely is more about discipline than technique. The techniques (strangler fig, expand-contract, characterization tests) are well-known — the discipline is resisting the pull to fix everything at once while you're in there. The first PR writes tests. The second PR changes one thing. The third PR changes another. Each is deployable, reviewable, and reversible. The codebase improves incrementally but continuously. The big-bang refactor that's "almost ready" for three months never ships — the series of small, safe steps does.