Cache Invalidation Hell — The Second Hardest Problem in Computer Science

Introduction

"There are only two hard things in computer science: cache invalidation and naming things." The joke persists because cache invalidation genuinely is hard. A cache that's never invalidated serves stale data. A cache that's invalidated too aggressively is just slow extra work with no latency benefit. Getting the balance right — and making sure invalidation actually happens on every write path — is where most cache implementations fail.

The Common Failure Modes
Strategy 1: Cache-Aside with Short TTL (Simplest)
Strategy 2: Tag-Based Invalidation
Strategy 3: Write-Through Cache
Strategy 4: Cache Stampede Prevention
Cache Invalidation Checklist
Conclusion

The Common Failure Modes

1. Write path A updates DB + cache
   Write path B updates DB but FORGETS to update cache
   → Stale data from path B served indefinitely

2. Cache key uses old ID format
   Data updated under new key
   → Old key served forever (no expiry, no invalidation)

3. Invalidation happens before DB write completes
   Cache miss → fetch from DB → DB not yet committed
   → Fresh but wrong data re-cached

4. Multi-key invalidation: update user invalidates user:{id}
   But also should invalidate user_list, user_count, team:{teamId}
   → Some stale views persist

5. Race condition: two writers invalidate + re-cache simultaneously
   → Last write wins but not the latest data

Strategy 1: Cache-Aside with Short TTL (Simplest)

class CacheAside {
  constructor(private redis: Redis, private db: Pool) {}

  async get<T>(key: string, fetchFn: () => Promise<T>, ttlSeconds = 60): Promise<T> {
    // Try cache first
    const cached = await this.redis.get(key)
    if (cached) return JSON.parse(cached)

    // Cache miss — fetch from DB
    const data = await fetchFn()

    // Store in cache
    await this.redis.setex(key, ttlSeconds, JSON.stringify(data))

    return data
  }

  async invalidate(key: string): Promise<void> {
    await this.redis.del(key)
  }
}

const cache = new CacheAside(redis, db)

// Read
async function getUser(userId: string) {
  return cache.get(
    `user:${userId}`,
    () => db.query('SELECT * FROM users WHERE id = $1', [userId]).then(r => r.rows[0]),
    300  // 5 minute TTL
  )
}

// Write — invalidate after update
async function updateUser(userId: string, data: UpdateUserDto) {
  await db.query('UPDATE users SET name = $1 WHERE id = $2', [data.name, userId])
  await cache.invalidate(`user:${userId}`)
  // Next read will be a cache miss → fresh data from DB
}

The TTL acts as a safety net: even if invalidation is missed somehow, the cache will self-heal within 5 minutes.

Strategy 2: Tag-Based Invalidation

When one entity change should invalidate many keys:

class TaggedCache {
  constructor(private redis: Redis) {}

  // Store a cache key with tags
  async set(key: string, value: any, tags: string[], ttlSeconds: number): Promise<void> {
    const pipeline = this.redis.pipeline()

    // Store the value
    pipeline.setex(key, ttlSeconds, JSON.stringify(value))

    // Register this key under each tag
    for (const tag of tags) {
      pipeline.sadd(`tag:${tag}`, key)
      pipeline.expire(`tag:${tag}`, ttlSeconds + 60)  // tag lives slightly longer
    }

    await pipeline.exec()
  }

  // Invalidate all keys with a given tag
  async invalidateTag(tag: string): Promise<void> {
    const keys = await this.redis.smembers(`tag:${tag}`)
    if (keys.length === 0) return

    const pipeline = this.redis.pipeline()
    for (const key of keys) pipeline.del(key)
    pipeline.del(`tag:${tag}`)
    await pipeline.exec()
  }
}

const cache = new TaggedCache(redis)

// Cache user data with tags
async function cacheUserProfile(userId: string) {
  const user = await db.query('SELECT * FROM users WHERE id = $1', [userId])

  await cache.set(
    `user:profile:${userId}`,
    user.rows[0],
    [`user:${userId}`, `org:${user.rows[0].org_id}`],  // tags
    300
  )
}

// When user updates — invalidate all user-tagged keys
async function updateUser(userId: string, data: any) {
  await db.query('UPDATE users SET name = $1 WHERE id = $2', [data.name, userId])

  // Invalidates: user:profile:{userId}, user:list, user:settings, etc.
  await cache.invalidateTag(`user:${userId}`)
}

// When org updates — invalidate all keys tagged with that org
async function updateOrg(orgId: string, data: any) {
  await db.query('UPDATE organizations SET name = $1 WHERE id = $2', [data.name, orgId])
  await cache.invalidateTag(`org:${orgId}`)
}

Strategy 3: Write-Through Cache

Write to cache AND database together — cache is never stale:

async function writeThrough<T>(
  key: string,
  value: T,
  ttlSeconds: number,
  dbWriteFn: () => Promise<void>
): Promise<void> {
  // Write to DB first
  await dbWriteFn()

  // Then update cache with new value
  await redis.setex(key, ttlSeconds, JSON.stringify(value))
  // Cache is always consistent with DB (no invalidation needed)
}

async function updateUserName(userId: string, name: string) {
  await writeThrough(
    `user:${userId}`,
    { id: userId, name },  // the cached value
    300,
    () => db.query('UPDATE users SET name = $1 WHERE id = $2', [name, userId])
  )
}

The downside: if the DB write succeeds but the cache write fails, they're inconsistent. Always write DB first, then cache — partial failure leaves you cache-less but correct.

Strategy 4: Cache Stampede Prevention

When a popular cache key expires, all concurrent requests miss and hammer the DB simultaneously:

async function getWithSingleFlight<T>(
  key: string,
  fetchFn: () => Promise<T>,
  ttlSeconds: number
): Promise<T> {
  // Check cache
  const cached = await redis.get(key)
  if (cached) return JSON.parse(cached)

  // Use Redis lock to ensure only ONE request fetches from DB
  const lockKey = `lock:${key}`
  const acquired = await redis.set(lockKey, '1', 'EX', 10, 'NX')

  if (!acquired) {
    // Another request is fetching — wait and retry from cache
    await sleep(100)
    const retried = await redis.get(key)
    if (retried) return JSON.parse(retried)
    // If still not cached after wait, fall through (rare)
  }

  try {
    const data = await fetchFn()
    await redis.setex(key, ttlSeconds, JSON.stringify(data))
    return data
  } finally {
    await redis.del(lockKey)
  }
}

Cache Invalidation Checklist

✅ Set a TTL on every cache entry — never cache without expiry
✅ Invalidate on every write path — audit all code that modifies data
✅ Use tag-based invalidation when one write affects many cache keys
✅ Write DB first, then cache — partial failures leave cache stale (not DB wrong)
✅ Use single-flight / mutex to prevent cache stampede
✅ Log cache hit/miss rates — drops in hit rate signal invalidation issues
✅ Test invalidation in integration tests, not just unit tests

Conclusion

Cache invalidation fails in two ways: either invalidation happens on some write paths but not others (causing permanent stale data), or cache structure doesn't match what queries need (tag-based invalidation solves this). The most reliable approach is short TTLs as a fallback safety net, explicit invalidation on every write, and tag-based grouping when a single entity change should clear multiple cache shapes. Whatever strategy you choose, make cache invalidation part of every write path code review — it's the part that's easiest to forget and most expensive to get wrong.