Published on

Cache Invalidation Hell — The Second Hardest Problem in Computer Science

Authors

Introduction

"There are only two hard things in computer science: cache invalidation and naming things." The joke persists because cache invalidation genuinely is hard. A cache that's never invalidated serves stale data. A cache that's invalidated too aggressively is just slow extra work with no latency benefit. Getting the balance right — and making sure invalidation actually happens on every write path — is where most cache implementations fail.

The Common Failure Modes

1. Write path A updates DB + cache
   Write path B updates DB but FORGETS to update cache
Stale data from path B served indefinitely

2. Cache key uses old ID format
   Data updated under new key
Old key served forever (no expiry, no invalidation)

3. Invalidation happens before DB write completes
   Cache miss → fetch from DBDB not yet committed
Fresh but wrong data re-cached

4. Multi-key invalidation: update user invalidates user:{id}
   But also should invalidate user_list, user_count, team:{teamId}
Some stale views persist

5. Race condition: two writers invalidate + re-cache simultaneously
Last write wins but not the latest data

Strategy 1: Cache-Aside with Short TTL (Simplest)

class CacheAside {
  constructor(private redis: Redis, private db: Pool) {}

  async get<T>(key: string, fetchFn: () => Promise<T>, ttlSeconds = 60): Promise<T> {
    // Try cache first
    const cached = await this.redis.get(key)
    if (cached) return JSON.parse(cached)

    // Cache miss — fetch from DB
    const data = await fetchFn()

    // Store in cache
    await this.redis.setex(key, ttlSeconds, JSON.stringify(data))

    return data
  }

  async invalidate(key: string): Promise<void> {
    await this.redis.del(key)
  }
}

const cache = new CacheAside(redis, db)

// Read
async function getUser(userId: string) {
  return cache.get(
    `user:${userId}`,
    () => db.query('SELECT * FROM users WHERE id = $1', [userId]).then(r => r.rows[0]),
    300  // 5 minute TTL
  )
}

// Write — invalidate after update
async function updateUser(userId: string, data: UpdateUserDto) {
  await db.query('UPDATE users SET name = $1 WHERE id = $2', [data.name, userId])
  await cache.invalidate(`user:${userId}`)
  // Next read will be a cache miss → fresh data from DB
}

The TTL acts as a safety net: even if invalidation is missed somehow, the cache will self-heal within 5 minutes.

Strategy 2: Tag-Based Invalidation

When one entity change should invalidate many keys:

class TaggedCache {
  constructor(private redis: Redis) {}

  // Store a cache key with tags
  async set(key: string, value: any, tags: string[], ttlSeconds: number): Promise<void> {
    const pipeline = this.redis.pipeline()

    // Store the value
    pipeline.setex(key, ttlSeconds, JSON.stringify(value))

    // Register this key under each tag
    for (const tag of tags) {
      pipeline.sadd(`tag:${tag}`, key)
      pipeline.expire(`tag:${tag}`, ttlSeconds + 60)  // tag lives slightly longer
    }

    await pipeline.exec()
  }

  // Invalidate all keys with a given tag
  async invalidateTag(tag: string): Promise<void> {
    const keys = await this.redis.smembers(`tag:${tag}`)
    if (keys.length === 0) return

    const pipeline = this.redis.pipeline()
    for (const key of keys) pipeline.del(key)
    pipeline.del(`tag:${tag}`)
    await pipeline.exec()
  }
}

const cache = new TaggedCache(redis)

// Cache user data with tags
async function cacheUserProfile(userId: string) {
  const user = await db.query('SELECT * FROM users WHERE id = $1', [userId])

  await cache.set(
    `user:profile:${userId}`,
    user.rows[0],
    [`user:${userId}`, `org:${user.rows[0].org_id}`],  // tags
    300
  )
}

// When user updates — invalidate all user-tagged keys
async function updateUser(userId: string, data: any) {
  await db.query('UPDATE users SET name = $1 WHERE id = $2', [data.name, userId])

  // Invalidates: user:profile:{userId}, user:list, user:settings, etc.
  await cache.invalidateTag(`user:${userId}`)
}

// When org updates — invalidate all keys tagged with that org
async function updateOrg(orgId: string, data: any) {
  await db.query('UPDATE organizations SET name = $1 WHERE id = $2', [data.name, orgId])
  await cache.invalidateTag(`org:${orgId}`)
}

Strategy 3: Write-Through Cache

Write to cache AND database together — cache is never stale:

async function writeThrough<T>(
  key: string,
  value: T,
  ttlSeconds: number,
  dbWriteFn: () => Promise<void>
): Promise<void> {
  // Write to DB first
  await dbWriteFn()

  // Then update cache with new value
  await redis.setex(key, ttlSeconds, JSON.stringify(value))
  // Cache is always consistent with DB (no invalidation needed)
}

async function updateUserName(userId: string, name: string) {
  await writeThrough(
    `user:${userId}`,
    { id: userId, name },  // the cached value
    300,
    () => db.query('UPDATE users SET name = $1 WHERE id = $2', [name, userId])
  )
}

The downside: if the DB write succeeds but the cache write fails, they're inconsistent. Always write DB first, then cache — partial failure leaves you cache-less but correct.

Strategy 4: Cache Stampede Prevention

When a popular cache key expires, all concurrent requests miss and hammer the DB simultaneously:

async function getWithSingleFlight<T>(
  key: string,
  fetchFn: () => Promise<T>,
  ttlSeconds: number
): Promise<T> {
  // Check cache
  const cached = await redis.get(key)
  if (cached) return JSON.parse(cached)

  // Use Redis lock to ensure only ONE request fetches from DB
  const lockKey = `lock:${key}`
  const acquired = await redis.set(lockKey, '1', 'EX', 10, 'NX')

  if (!acquired) {
    // Another request is fetching — wait and retry from cache
    await sleep(100)
    const retried = await redis.get(key)
    if (retried) return JSON.parse(retried)
    // If still not cached after wait, fall through (rare)
  }

  try {
    const data = await fetchFn()
    await redis.setex(key, ttlSeconds, JSON.stringify(data))
    return data
  } finally {
    await redis.del(lockKey)
  }
}

Cache Invalidation Checklist

  • ✅ Set a TTL on every cache entry — never cache without expiry
  • ✅ Invalidate on every write path — audit all code that modifies data
  • ✅ Use tag-based invalidation when one write affects many cache keys
  • ✅ Write DB first, then cache — partial failures leave cache stale (not DB wrong)
  • ✅ Use single-flight / mutex to prevent cache stampede
  • ✅ Log cache hit/miss rates — drops in hit rate signal invalidation issues
  • ✅ Test invalidation in integration tests, not just unit tests

Conclusion

Cache invalidation fails in two ways: either invalidation happens on some write paths but not others (causing permanent stale data), or cache structure doesn't match what queries need (tag-based invalidation solves this). The most reliable approach is short TTLs as a fallback safety net, explicit invalidation on every write, and tag-based grouping when a single entity change should clear multiple cache shapes. Whatever strategy you choose, make cache invalidation part of every write path code review — it's the part that's easiest to forget and most expensive to get wrong.