Distributed Locking — Redis Redlock, Database Locks, and When You Actually Need Them

Sanjeev SharmaSanjeev Sharma
10 min read

Advertisement

Introduction

Distributed locks prevent concurrent access to shared resources across multiple servers. They're seductive—they feel safe—but they're also a common source of subtle bugs. A network partition can leave a lock held forever. A crashed process never releases its lock. This post covers the tools (Redis, PostgreSQL) and when you actually need them. Spoiler: often you don't.

Redis SET NX PX: Simple Distributed Lock

The simplest distributed lock uses Redis SET with NX (only if not exists) and PX (expiration).

class RedisLock {
  constructor(private redis: Redis, private lockTimeout = 30000) {}

  async acquire(key: string): Promise<string | null> {
    const token = uuid();
    const result = await this.redis.set(
      `lock:${key}`,
      token,
      'NX', // Only set if not exists
      'PX', // Milliseconds
      this.lockTimeout
    );

    return result === 'OK' ? token : null;
  }

  async release(key: string, token: string): Promise<boolean> {
    // Use Lua script to ensure atomic check-and-delete
    const script = `
      if redis.call("get", KEYS[1]) == ARGV[1] then
        return redis.call("del", KEYS[1])
      else
        return 0
      end
    `;

    const result = await this.redis.eval(script, 1, `lock:${key}`, token);
    return result === 1;
  }

  async withLock<T>(key: string, fn: () => Promise<T>, retries = 3): Promise<T> {
    for (let i = 0; i < retries; i++) {
      const token = await this.acquire(key);
      if (token) {
        try {
          return await fn();
        } finally {
          await this.release(key, token);
        }
      }
      // Exponential backoff
      await sleep(Math.pow(2, i) * 100 + Math.random() * 100);
    }
    throw new Error(`Failed to acquire lock: ${key}`);
  }
}

Pitfalls:

  • If the client crashes, the lock expires after timeout (good), but operations during that timeout can race
  • Network partition: client thinks it released lock, but Redis didn't receive the message; another client acquires lock
  • Process pauses (GC, context switch) can cause lock expiration while code still runs

Redlock Algorithm

Redlock uses multiple Redis instances to improve safety. Acquire a lock on a majority (typically 5) nodes.

class RedLock {
  private locks: Redis[];
  private lockTimeout = 30000;
  private clockDrift = 200; // Clock drift tolerance (ms)

  constructor(redisInstances: Redis[]) {
    this.locks = redisInstances;
    if (redisInstances.length % 2 === 0) {
      console.warn('Odd number of Redis instances recommended for Redlock');
    }
  }

  async acquire(key: string, retries = 3): Promise<string | null> {
    const token = uuid();
    const quorum = Math.floor(this.locks.length / 2) + 1;
    const deadline = Date.now() + this.lockTimeout;

    for (let attempt = 0; attempt < retries; attempt++) {
      const startTime = Date.now();
      let successCount = 0;

      // Try to acquire lock on all instances
      const promises = this.locks.map(lock =>
        lock
          .set(`lock:${key}`, token, 'NX', 'PX', this.lockTimeout)
          .then(() => {
            successCount++;
          })
          .catch(() => {}) // Ignore individual lock failures
      );

      await Promise.all(promises);

      const elapsed = Date.now() - startTime;
      const lockValidityTime = this.lockTimeout - elapsed - this.clockDrift;

      // Did we get a quorum?
      if (successCount >= quorum && lockValidityTime > 0) {
        return token;
      }

      // Release all locks we acquired
      await this.releaseAll(key, token);

      // Backoff before retry
      const backoff = Math.random() * Math.pow(2, attempt) * 100;
      await sleep(backoff);
    }

    return null;
  }

  async release(key: string, token: string): Promise<number> {
    const script = `
      if redis.call("get", KEYS[1]) == ARGV[1] then
        return redis.call("del", KEYS[1])
      else
        return 0
      end
    `;

    // Release from all instances
    const results = await Promise.all(
      this.locks.map(lock => lock.eval(script, 1, `lock:${key}`, token).catch(() => 0))
    );

    return results.reduce((sum, r) => sum + r, 0);
  }

  private async releaseAll(key: string, token: string): Promise<void> {
    const script = `
      if redis.call("get", KEYS[1]) == ARGV[1] then
        return redis.call("del", KEYS[1])
      else
        return 0
      end
    `;

    await Promise.all(
      this.locks.map(lock => lock.eval(script, 1, `lock:${key}`, token).catch(() => {}))
    );
  }

  async withLock<T>(key: string, fn: () => Promise<T>): Promise<T> {
    const token = await this.acquire(key);
    if (!token) {
      throw new Error(`Failed to acquire lock: ${key}`);
    }

    try {
      return await fn();
    } finally {
      await this.release(key, token);
    }
  }
}

Why 5 nodes?

  • Quorum = 3 (majority)
  • Can tolerate 2 failures
  • Even with 1 node down for maintenance, you keep running

Lock Renewal for Long Operations

If your operation takes longer than the lock timeout, renew the lock.

class RenewableLock {
  async withLock<T>(
    key: string,
    fn: () => Promise<T>,
    lockTimeout = 30000,
    renewInterval = 10000
  ): Promise<T> {
    const token = uuid();
    let isLocked = true;
    let renewalError: Error | null = null;

    const acquire = await this.redis.set(
      `lock:${key}`,
      token,
      'NX',
      'PX',
      lockTimeout
    );
    if (acquire !== 'OK') {
      throw new Error(`Failed to acquire lock: ${key}`);
    }

    // Start renewal interval
    const renewalHandle = setInterval(async () => {
      try {
        const script = `
          if redis.call("get", KEYS[1]) == ARGV[1] then
            return redis.call("pexpire", KEYS[1], ARGV[2])
          else
            return 0
          end
        `;
        const result = await this.redis.eval(script, 1, `lock:${key}`, token, lockTimeout);
        if (result === 0) {
          renewalError = new Error('Lock lost during renewal');
          isLocked = false;
        }
      } catch (error) {
        renewalError = error as Error;
        isLocked = false;
      }
    }, renewInterval);

    try {
      const result = await fn();

      // Check if lock was lost during operation
      if (!isLocked || renewalError) {
        throw renewalError || new Error('Lock was lost during operation');
      }

      return result;
    } finally {
      clearInterval(renewalHandle);
      await this.release(key, token);
    }
  }

  private async release(key: string, token: string): Promise<void> {
    const script = `
      if redis.call("get", KEYS[1]) == ARGV[1] then
        return redis.call("del", KEYS[1])
      else
        return 0
      end
    `;
    await this.redis.eval(script, 1, `lock:${key}`, token);
  }
}

Fencing Tokens for Safety

After a lock expires, another client acquires it and starts work. The first client (which was paused) resumes and makes changes. Fencing tokens prevent this: include a monotonically increasing token with each operation.

class FencedLock {
  async acquire(key: string): Promise<{ token: string; fence: number } | null> {
    const token = uuid();
    const script = `
      local fence = redis.call("incr", KEYS[1] .. ":fence")
      local result = redis.call("set", KEYS[1], ARGV[1], "NX", "PX", ARGV[2])
      if result == "OK" then
        return {token, fence}
      else
        return nil
      end
    `;

    const result = await this.redis.eval(
      script,
      1,
      `lock:${key}`,
      token,
      30000
    );

    if (result) {
      return { token: result[0], fence: result[1] };
    }
    return null;
  }

  async executeWithFence<T>(
    key: string,
    fn: (fence: number) => Promise<T>
  ): Promise<T> {
    const lock = await this.acquire(key);
    if (!lock) {
      throw new Error(`Failed to acquire lock: ${key}`);
    }

    try {
      // Pass fence token to the operation
      // The operation must use this token to guard state mutations
      return await fn(lock.fence);
    } finally {
      await this.release(key, lock.token);
    }
  }

  private async release(key: string, token: string): Promise<void> {
    const script = `
      if redis.call("get", KEYS[1]) == ARGV[1] then
        return redis.call("del", KEYS[1])
      else
        return 0
      end
    `;
    await this.redis.eval(script, 1, `lock:${key}`, token);
  }
}

// Usage with fenced operations
class Account {
  async transfer(fromAccountId: string, toAccountId: string, amount: number): Promise<void> {
    await lock.executeWithFence(`transfer:${fromAccountId}`, async fence => {
      const account = await db.query(
        `SELECT balance, fence FROM accounts WHERE id = $1 FOR UPDATE`,
        [fromAccountId]
      );

      if (account.rows[0].fence >= fence) {
        throw new Error('Lock lost; another process modified this account');
      }

      const newBalance = account.rows[0].balance - amount;
      if (newBalance < 0) {
        throw new Error('Insufficient funds');
      }

      await db.query(
        `UPDATE accounts SET balance = $1, fence = $2 WHERE id = $3`,
        [newBalance, fence, fromAccountId]
      );

      await db.query(
        `UPDATE accounts SET balance = balance + $1 WHERE id = $2`,
        [amount, toAccountId]
      );
    });
  }
}

PostgreSQL Advisory Locks

For resources within a single database, advisory locks are simpler and safer than distributed locks.

class PostgreSQLAdvisoryLock {
  async withLock<T>(
    lockId: number,
    fn: () => Promise<T>,
    shared = false
  ): Promise<T> {
    const conn = await this.pool.connect();

    try {
      // pg_advisory_lock blocks; pg_try_advisory_lock returns immediately
      const lockFunc = shared ? 'pg_advisory_xact_lock_shared' : 'pg_advisory_xact_lock';
      await conn.query(`SELECT ${lockFunc}($1)`, [lockId]);

      return await fn();
    } finally {
      conn.release();
      // Lock auto-releases at transaction end
    }
  }

  async withTryLock<T>(lockId: number, fn: () => Promise<T>): Promise<T | null> {
    const conn = await this.pool.connect();

    try {
      const result = await conn.query(`SELECT pg_try_advisory_lock($1)`, [lockId]);
      if (!result.rows[0].pg_try_advisory_lock) {
        return null; // Lock not acquired
      }

      return await fn();
    } finally {
      conn.release();
    }
  }
}

SELECT FOR UPDATE SKIP LOCKED for Job Queues

Instead of explicit locks, use database constraints and SELECT FOR UPDATE.

class JobQueue {
  async dequeueJob(workerId: string): Promise<Job | null> {
    const result = await this.db.query(
      `UPDATE jobs
       SET worker_id = $1, status = 'processing', claimed_at = NOW()
       WHERE id = (
         SELECT id FROM jobs
         WHERE status = 'pending'
         ORDER BY priority DESC, created_at ASC
         FOR UPDATE SKIP LOCKED
         LIMIT 1
       )
       RETURNING id, payload, attempt_count`,
      [workerId]
    );

    return result.rows[0] || null;
  }

  async completeJob(jobId: string): Promise<void> {
    await this.db.query(
      `UPDATE jobs SET status = 'completed', completed_at = NOW() WHERE id = $1`,
      [jobId]
    );
  }

  async failJob(jobId: string, error: string): Promise<void> {
    const result = await this.db.query(
      `SELECT attempt_count, max_attempts FROM jobs WHERE id = $1`,
      [jobId]
    );

    const { attempt_count, max_attempts } = result.rows[0];
    if (attempt_count >= max_attempts) {
      await this.db.query(
        `UPDATE jobs SET status = 'failed', error = $1 WHERE id = $2`,
        [error, jobId]
      );
    } else {
      await this.db.query(
        `UPDATE jobs SET status = 'pending', attempt_count = attempt_count + 1 WHERE id = $1`,
        [jobId]
      );
    }
  }
}

Lock-Free Alternatives: Optimistic Concurrency

Often you don't need locks. Use version numbers and retry on conflict.

class OptimisticConcurrency {
  async updateAccount(accountId: string, updates: any): Promise<void> {
    let retries = 0;
    const maxRetries = 5;

    while (retries < maxRetries) {
      // Read current version
      const current = await this.db.query(
        `SELECT version, balance FROM accounts WHERE id = $1`,
        [accountId]
      );

      if (current.rows.length === 0) {
        throw new Error(`Account ${accountId} not found`);
      }

      const { version, balance } = current.rows[0];

      // Apply updates
      const newBalance = balance + updates.amount;

      // Try to update only if version hasn't changed
      const updateResult = await this.db.query(
        `UPDATE accounts SET balance = $1, version = version + 1
         WHERE id = $2 AND version = $3`,
        [newBalance, accountId, version]
      );

      if (updateResult.rowCount === 1) {
        return; // Success
      }

      // Version mismatch; retry
      retries++;
      await sleep(Math.pow(2, retries) * 10 + Math.random() * 10);
    }

    throw new Error(`Failed to update account after ${maxRetries} retries`);
  }
}

When Distributed Locks Are Wrong

Distributed locks add latency and complexity. Consider alternatives:

  1. Caching: Use cache-aside pattern with weak consistency
  2. Partitioning: Ensure no two servers handle the same resource
  3. Optimistic concurrency: Accept conflicts, retry on mismatch
  4. Work stealing: ONE server owns a resource; if it dies, another steals work
  5. Sharding: Route by customer/resource to a single owner

Checklist

  • Understand your locking requirement: are you preventing races or just coordinating?
  • Use database locks for single-database operations
  • Use optimistic concurrency unless strong mutual exclusion is required
  • If using Redis locks, implement token-based release (not just TTL)
  • For long operations, implement renewal or use fencing tokens
  • Test lock failures: crashed processes, network partitions, clock skew
  • Monitor lock contention and timeout rates
  • Avoid nested locks (deadlock risk)

Conclusion

Distributed locks feel like a safety net, but they have holes. For single-database operations, use advisory locks. For cross-service coordination, ask if you really need mutual exclusion—often optimistic concurrency or work partitioning is simpler and faster. If you do use locks, include tokens and test failure scenarios ruthlessly.

Advertisement

Sanjeev Sharma

Written by

Sanjeev Sharma

Full Stack Engineer · E-mopro