Distributed Locking — Redis Redlock, Database Locks, and When You Actually Need Them
Advertisement
Introduction
Distributed locks prevent concurrent access to shared resources across multiple servers. They're seductive—they feel safe—but they're also a common source of subtle bugs. A network partition can leave a lock held forever. A crashed process never releases its lock. This post covers the tools (Redis, PostgreSQL) and when you actually need them. Spoiler: often you don't.
- Redis SET NX PX: Simple Distributed Lock
- Redlock Algorithm
- Lock Renewal for Long Operations
- Fencing Tokens for Safety
- PostgreSQL Advisory Locks
- SELECT FOR UPDATE SKIP LOCKED for Job Queues
- Lock-Free Alternatives: Optimistic Concurrency
- When Distributed Locks Are Wrong
- Checklist
- Conclusion
Redis SET NX PX: Simple Distributed Lock
The simplest distributed lock uses Redis SET with NX (only if not exists) and PX (expiration).
class RedisLock {
constructor(private redis: Redis, private lockTimeout = 30000) {}
async acquire(key: string): Promise<string | null> {
const token = uuid();
const result = await this.redis.set(
`lock:${key}`,
token,
'NX', // Only set if not exists
'PX', // Milliseconds
this.lockTimeout
);
return result === 'OK' ? token : null;
}
async release(key: string, token: string): Promise<boolean> {
// Use Lua script to ensure atomic check-and-delete
const script = `
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
`;
const result = await this.redis.eval(script, 1, `lock:${key}`, token);
return result === 1;
}
async withLock<T>(key: string, fn: () => Promise<T>, retries = 3): Promise<T> {
for (let i = 0; i < retries; i++) {
const token = await this.acquire(key);
if (token) {
try {
return await fn();
} finally {
await this.release(key, token);
}
}
// Exponential backoff
await sleep(Math.pow(2, i) * 100 + Math.random() * 100);
}
throw new Error(`Failed to acquire lock: ${key}`);
}
}
Pitfalls:
- If the client crashes, the lock expires after timeout (good), but operations during that timeout can race
- Network partition: client thinks it released lock, but Redis didn't receive the message; another client acquires lock
- Process pauses (GC, context switch) can cause lock expiration while code still runs
Redlock Algorithm
Redlock uses multiple Redis instances to improve safety. Acquire a lock on a majority (typically 5) nodes.
class RedLock {
private locks: Redis[];
private lockTimeout = 30000;
private clockDrift = 200; // Clock drift tolerance (ms)
constructor(redisInstances: Redis[]) {
this.locks = redisInstances;
if (redisInstances.length % 2 === 0) {
console.warn('Odd number of Redis instances recommended for Redlock');
}
}
async acquire(key: string, retries = 3): Promise<string | null> {
const token = uuid();
const quorum = Math.floor(this.locks.length / 2) + 1;
const deadline = Date.now() + this.lockTimeout;
for (let attempt = 0; attempt < retries; attempt++) {
const startTime = Date.now();
let successCount = 0;
// Try to acquire lock on all instances
const promises = this.locks.map(lock =>
lock
.set(`lock:${key}`, token, 'NX', 'PX', this.lockTimeout)
.then(() => {
successCount++;
})
.catch(() => {}) // Ignore individual lock failures
);
await Promise.all(promises);
const elapsed = Date.now() - startTime;
const lockValidityTime = this.lockTimeout - elapsed - this.clockDrift;
// Did we get a quorum?
if (successCount >= quorum && lockValidityTime > 0) {
return token;
}
// Release all locks we acquired
await this.releaseAll(key, token);
// Backoff before retry
const backoff = Math.random() * Math.pow(2, attempt) * 100;
await sleep(backoff);
}
return null;
}
async release(key: string, token: string): Promise<number> {
const script = `
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
`;
// Release from all instances
const results = await Promise.all(
this.locks.map(lock => lock.eval(script, 1, `lock:${key}`, token).catch(() => 0))
);
return results.reduce((sum, r) => sum + r, 0);
}
private async releaseAll(key: string, token: string): Promise<void> {
const script = `
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
`;
await Promise.all(
this.locks.map(lock => lock.eval(script, 1, `lock:${key}`, token).catch(() => {}))
);
}
async withLock<T>(key: string, fn: () => Promise<T>): Promise<T> {
const token = await this.acquire(key);
if (!token) {
throw new Error(`Failed to acquire lock: ${key}`);
}
try {
return await fn();
} finally {
await this.release(key, token);
}
}
}
Why 5 nodes?
- Quorum = 3 (majority)
- Can tolerate 2 failures
- Even with 1 node down for maintenance, you keep running
Lock Renewal for Long Operations
If your operation takes longer than the lock timeout, renew the lock.
class RenewableLock {
async withLock<T>(
key: string,
fn: () => Promise<T>,
lockTimeout = 30000,
renewInterval = 10000
): Promise<T> {
const token = uuid();
let isLocked = true;
let renewalError: Error | null = null;
const acquire = await this.redis.set(
`lock:${key}`,
token,
'NX',
'PX',
lockTimeout
);
if (acquire !== 'OK') {
throw new Error(`Failed to acquire lock: ${key}`);
}
// Start renewal interval
const renewalHandle = setInterval(async () => {
try {
const script = `
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("pexpire", KEYS[1], ARGV[2])
else
return 0
end
`;
const result = await this.redis.eval(script, 1, `lock:${key}`, token, lockTimeout);
if (result === 0) {
renewalError = new Error('Lock lost during renewal');
isLocked = false;
}
} catch (error) {
renewalError = error as Error;
isLocked = false;
}
}, renewInterval);
try {
const result = await fn();
// Check if lock was lost during operation
if (!isLocked || renewalError) {
throw renewalError || new Error('Lock was lost during operation');
}
return result;
} finally {
clearInterval(renewalHandle);
await this.release(key, token);
}
}
private async release(key: string, token: string): Promise<void> {
const script = `
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
`;
await this.redis.eval(script, 1, `lock:${key}`, token);
}
}
Fencing Tokens for Safety
After a lock expires, another client acquires it and starts work. The first client (which was paused) resumes and makes changes. Fencing tokens prevent this: include a monotonically increasing token with each operation.
class FencedLock {
async acquire(key: string): Promise<{ token: string; fence: number } | null> {
const token = uuid();
const script = `
local fence = redis.call("incr", KEYS[1] .. ":fence")
local result = redis.call("set", KEYS[1], ARGV[1], "NX", "PX", ARGV[2])
if result == "OK" then
return {token, fence}
else
return nil
end
`;
const result = await this.redis.eval(
script,
1,
`lock:${key}`,
token,
30000
);
if (result) {
return { token: result[0], fence: result[1] };
}
return null;
}
async executeWithFence<T>(
key: string,
fn: (fence: number) => Promise<T>
): Promise<T> {
const lock = await this.acquire(key);
if (!lock) {
throw new Error(`Failed to acquire lock: ${key}`);
}
try {
// Pass fence token to the operation
// The operation must use this token to guard state mutations
return await fn(lock.fence);
} finally {
await this.release(key, lock.token);
}
}
private async release(key: string, token: string): Promise<void> {
const script = `
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
`;
await this.redis.eval(script, 1, `lock:${key}`, token);
}
}
// Usage with fenced operations
class Account {
async transfer(fromAccountId: string, toAccountId: string, amount: number): Promise<void> {
await lock.executeWithFence(`transfer:${fromAccountId}`, async fence => {
const account = await db.query(
`SELECT balance, fence FROM accounts WHERE id = $1 FOR UPDATE`,
[fromAccountId]
);
if (account.rows[0].fence >= fence) {
throw new Error('Lock lost; another process modified this account');
}
const newBalance = account.rows[0].balance - amount;
if (newBalance < 0) {
throw new Error('Insufficient funds');
}
await db.query(
`UPDATE accounts SET balance = $1, fence = $2 WHERE id = $3`,
[newBalance, fence, fromAccountId]
);
await db.query(
`UPDATE accounts SET balance = balance + $1 WHERE id = $2`,
[amount, toAccountId]
);
});
}
}
PostgreSQL Advisory Locks
For resources within a single database, advisory locks are simpler and safer than distributed locks.
class PostgreSQLAdvisoryLock {
async withLock<T>(
lockId: number,
fn: () => Promise<T>,
shared = false
): Promise<T> {
const conn = await this.pool.connect();
try {
// pg_advisory_lock blocks; pg_try_advisory_lock returns immediately
const lockFunc = shared ? 'pg_advisory_xact_lock_shared' : 'pg_advisory_xact_lock';
await conn.query(`SELECT ${lockFunc}($1)`, [lockId]);
return await fn();
} finally {
conn.release();
// Lock auto-releases at transaction end
}
}
async withTryLock<T>(lockId: number, fn: () => Promise<T>): Promise<T | null> {
const conn = await this.pool.connect();
try {
const result = await conn.query(`SELECT pg_try_advisory_lock($1)`, [lockId]);
if (!result.rows[0].pg_try_advisory_lock) {
return null; // Lock not acquired
}
return await fn();
} finally {
conn.release();
}
}
}
SELECT FOR UPDATE SKIP LOCKED for Job Queues
Instead of explicit locks, use database constraints and SELECT FOR UPDATE.
class JobQueue {
async dequeueJob(workerId: string): Promise<Job | null> {
const result = await this.db.query(
`UPDATE jobs
SET worker_id = $1, status = 'processing', claimed_at = NOW()
WHERE id = (
SELECT id FROM jobs
WHERE status = 'pending'
ORDER BY priority DESC, created_at ASC
FOR UPDATE SKIP LOCKED
LIMIT 1
)
RETURNING id, payload, attempt_count`,
[workerId]
);
return result.rows[0] || null;
}
async completeJob(jobId: string): Promise<void> {
await this.db.query(
`UPDATE jobs SET status = 'completed', completed_at = NOW() WHERE id = $1`,
[jobId]
);
}
async failJob(jobId: string, error: string): Promise<void> {
const result = await this.db.query(
`SELECT attempt_count, max_attempts FROM jobs WHERE id = $1`,
[jobId]
);
const { attempt_count, max_attempts } = result.rows[0];
if (attempt_count >= max_attempts) {
await this.db.query(
`UPDATE jobs SET status = 'failed', error = $1 WHERE id = $2`,
[error, jobId]
);
} else {
await this.db.query(
`UPDATE jobs SET status = 'pending', attempt_count = attempt_count + 1 WHERE id = $1`,
[jobId]
);
}
}
}
Lock-Free Alternatives: Optimistic Concurrency
Often you don't need locks. Use version numbers and retry on conflict.
class OptimisticConcurrency {
async updateAccount(accountId: string, updates: any): Promise<void> {
let retries = 0;
const maxRetries = 5;
while (retries < maxRetries) {
// Read current version
const current = await this.db.query(
`SELECT version, balance FROM accounts WHERE id = $1`,
[accountId]
);
if (current.rows.length === 0) {
throw new Error(`Account ${accountId} not found`);
}
const { version, balance } = current.rows[0];
// Apply updates
const newBalance = balance + updates.amount;
// Try to update only if version hasn't changed
const updateResult = await this.db.query(
`UPDATE accounts SET balance = $1, version = version + 1
WHERE id = $2 AND version = $3`,
[newBalance, accountId, version]
);
if (updateResult.rowCount === 1) {
return; // Success
}
// Version mismatch; retry
retries++;
await sleep(Math.pow(2, retries) * 10 + Math.random() * 10);
}
throw new Error(`Failed to update account after ${maxRetries} retries`);
}
}
When Distributed Locks Are Wrong
Distributed locks add latency and complexity. Consider alternatives:
- Caching: Use cache-aside pattern with weak consistency
- Partitioning: Ensure no two servers handle the same resource
- Optimistic concurrency: Accept conflicts, retry on mismatch
- Work stealing: ONE server owns a resource; if it dies, another steals work
- Sharding: Route by customer/resource to a single owner
Checklist
- Understand your locking requirement: are you preventing races or just coordinating?
- Use database locks for single-database operations
- Use optimistic concurrency unless strong mutual exclusion is required
- If using Redis locks, implement token-based release (not just TTL)
- For long operations, implement renewal or use fencing tokens
- Test lock failures: crashed processes, network partitions, clock skew
- Monitor lock contention and timeout rates
- Avoid nested locks (deadlock risk)
Conclusion
Distributed locks feel like a safety net, but they have holes. For single-database operations, use advisory locks. For cross-service coordination, ask if you really need mutual exclusion—often optimistic concurrency or work partitioning is simpler and faster. If you do use locks, include tokens and test failure scenarios ruthlessly.
Advertisement