- Published on
Distributed Locking — Redis Redlock, Database Locks, and When You Actually Need Them
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
Distributed locks prevent concurrent access to shared resources across multiple servers. They're seductive—they feel safe—but they're also a common source of subtle bugs. A network partition can leave a lock held forever. A crashed process never releases its lock. This post covers the tools (Redis, PostgreSQL) and when you actually need them. Spoiler: often you don't.
- Redis SET NX PX: Simple Distributed Lock
- Redlock Algorithm
- Lock Renewal for Long Operations
- Fencing Tokens for Safety
- PostgreSQL Advisory Locks
- SELECT FOR UPDATE SKIP LOCKED for Job Queues
- Lock-Free Alternatives: Optimistic Concurrency
- When Distributed Locks Are Wrong
- Checklist
- Conclusion
Redis SET NX PX: Simple Distributed Lock
The simplest distributed lock uses Redis SET with NX (only if not exists) and PX (expiration).
class RedisLock {
constructor(private redis: Redis, private lockTimeout = 30000) {}
async acquire(key: string): Promise<string | null> {
const token = uuid();
const result = await this.redis.set(
`lock:${key}`,
token,
'NX', // Only set if not exists
'PX', // Milliseconds
this.lockTimeout
);
return result === 'OK' ? token : null;
}
async release(key: string, token: string): Promise<boolean> {
// Use Lua script to ensure atomic check-and-delete
const script = `
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
`;
const result = await this.redis.eval(script, 1, `lock:${key}`, token);
return result === 1;
}
async withLock<T>(key: string, fn: () => Promise<T>, retries = 3): Promise<T> {
for (let i = 0; i < retries; i++) {
const token = await this.acquire(key);
if (token) {
try {
return await fn();
} finally {
await this.release(key, token);
}
}
// Exponential backoff
await sleep(Math.pow(2, i) * 100 + Math.random() * 100);
}
throw new Error(`Failed to acquire lock: ${key}`);
}
}
Pitfalls:
- If the client crashes, the lock expires after timeout (good), but operations during that timeout can race
- Network partition: client thinks it released lock, but Redis didn't receive the message; another client acquires lock
- Process pauses (GC, context switch) can cause lock expiration while code still runs
Redlock Algorithm
Redlock uses multiple Redis instances to improve safety. Acquire a lock on a majority (typically 5) nodes.
class RedLock {
private locks: Redis[];
private lockTimeout = 30000;
private clockDrift = 200; // Clock drift tolerance (ms)
constructor(redisInstances: Redis[]) {
this.locks = redisInstances;
if (redisInstances.length % 2 === 0) {
console.warn('Odd number of Redis instances recommended for Redlock');
}
}
async acquire(key: string, retries = 3): Promise<string | null> {
const token = uuid();
const quorum = Math.floor(this.locks.length / 2) + 1;
const deadline = Date.now() + this.lockTimeout;
for (let attempt = 0; attempt < retries; attempt++) {
const startTime = Date.now();
let successCount = 0;
// Try to acquire lock on all instances
const promises = this.locks.map(lock =>
lock
.set(`lock:${key}`, token, 'NX', 'PX', this.lockTimeout)
.then(() => {
successCount++;
})
.catch(() => {}) // Ignore individual lock failures
);
await Promise.all(promises);
const elapsed = Date.now() - startTime;
const lockValidityTime = this.lockTimeout - elapsed - this.clockDrift;
// Did we get a quorum?
if (successCount >= quorum && lockValidityTime > 0) {
return token;
}
// Release all locks we acquired
await this.releaseAll(key, token);
// Backoff before retry
const backoff = Math.random() * Math.pow(2, attempt) * 100;
await sleep(backoff);
}
return null;
}
async release(key: string, token: string): Promise<number> {
const script = `
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
`;
// Release from all instances
const results = await Promise.all(
this.locks.map(lock => lock.eval(script, 1, `lock:${key}`, token).catch(() => 0))
);
return results.reduce((sum, r) => sum + r, 0);
}
private async releaseAll(key: string, token: string): Promise<void> {
const script = `
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
`;
await Promise.all(
this.locks.map(lock => lock.eval(script, 1, `lock:${key}`, token).catch(() => {}))
);
}
async withLock<T>(key: string, fn: () => Promise<T>): Promise<T> {
const token = await this.acquire(key);
if (!token) {
throw new Error(`Failed to acquire lock: ${key}`);
}
try {
return await fn();
} finally {
await this.release(key, token);
}
}
}
Why 5 nodes?
- Quorum = 3 (majority)
- Can tolerate 2 failures
- Even with 1 node down for maintenance, you keep running
Lock Renewal for Long Operations
If your operation takes longer than the lock timeout, renew the lock.
class RenewableLock {
async withLock<T>(
key: string,
fn: () => Promise<T>,
lockTimeout = 30000,
renewInterval = 10000
): Promise<T> {
const token = uuid();
let isLocked = true;
let renewalError: Error | null = null;
const acquire = await this.redis.set(
`lock:${key}`,
token,
'NX',
'PX',
lockTimeout
);
if (acquire !== 'OK') {
throw new Error(`Failed to acquire lock: ${key}`);
}
// Start renewal interval
const renewalHandle = setInterval(async () => {
try {
const script = `
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("pexpire", KEYS[1], ARGV[2])
else
return 0
end
`;
const result = await this.redis.eval(script, 1, `lock:${key}`, token, lockTimeout);
if (result === 0) {
renewalError = new Error('Lock lost during renewal');
isLocked = false;
}
} catch (error) {
renewalError = error as Error;
isLocked = false;
}
}, renewInterval);
try {
const result = await fn();
// Check if lock was lost during operation
if (!isLocked || renewalError) {
throw renewalError || new Error('Lock was lost during operation');
}
return result;
} finally {
clearInterval(renewalHandle);
await this.release(key, token);
}
}
private async release(key: string, token: string): Promise<void> {
const script = `
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
`;
await this.redis.eval(script, 1, `lock:${key}`, token);
}
}
Fencing Tokens for Safety
After a lock expires, another client acquires it and starts work. The first client (which was paused) resumes and makes changes. Fencing tokens prevent this: include a monotonically increasing token with each operation.
class FencedLock {
async acquire(key: string): Promise<{ token: string; fence: number } | null> {
const token = uuid();
const script = `
local fence = redis.call("incr", KEYS[1] .. ":fence")
local result = redis.call("set", KEYS[1], ARGV[1], "NX", "PX", ARGV[2])
if result == "OK" then
return {token, fence}
else
return nil
end
`;
const result = await this.redis.eval(
script,
1,
`lock:${key}`,
token,
30000
);
if (result) {
return { token: result[0], fence: result[1] };
}
return null;
}
async executeWithFence<T>(
key: string,
fn: (fence: number) => Promise<T>
): Promise<T> {
const lock = await this.acquire(key);
if (!lock) {
throw new Error(`Failed to acquire lock: ${key}`);
}
try {
// Pass fence token to the operation
// The operation must use this token to guard state mutations
return await fn(lock.fence);
} finally {
await this.release(key, lock.token);
}
}
private async release(key: string, token: string): Promise<void> {
const script = `
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
`;
await this.redis.eval(script, 1, `lock:${key}`, token);
}
}
// Usage with fenced operations
class Account {
async transfer(fromAccountId: string, toAccountId: string, amount: number): Promise<void> {
await lock.executeWithFence(`transfer:${fromAccountId}`, async fence => {
const account = await db.query(
`SELECT balance, fence FROM accounts WHERE id = $1 FOR UPDATE`,
[fromAccountId]
);
if (account.rows[0].fence >= fence) {
throw new Error('Lock lost; another process modified this account');
}
const newBalance = account.rows[0].balance - amount;
if (newBalance < 0) {
throw new Error('Insufficient funds');
}
await db.query(
`UPDATE accounts SET balance = $1, fence = $2 WHERE id = $3`,
[newBalance, fence, fromAccountId]
);
await db.query(
`UPDATE accounts SET balance = balance + $1 WHERE id = $2`,
[amount, toAccountId]
);
});
}
}
PostgreSQL Advisory Locks
For resources within a single database, advisory locks are simpler and safer than distributed locks.
class PostgreSQLAdvisoryLock {
async withLock<T>(
lockId: number,
fn: () => Promise<T>,
shared = false
): Promise<T> {
const conn = await this.pool.connect();
try {
// pg_advisory_lock blocks; pg_try_advisory_lock returns immediately
const lockFunc = shared ? 'pg_advisory_xact_lock_shared' : 'pg_advisory_xact_lock';
await conn.query(`SELECT ${lockFunc}($1)`, [lockId]);
return await fn();
} finally {
conn.release();
// Lock auto-releases at transaction end
}
}
async withTryLock<T>(lockId: number, fn: () => Promise<T>): Promise<T | null> {
const conn = await this.pool.connect();
try {
const result = await conn.query(`SELECT pg_try_advisory_lock($1)`, [lockId]);
if (!result.rows[0].pg_try_advisory_lock) {
return null; // Lock not acquired
}
return await fn();
} finally {
conn.release();
}
}
}
SELECT FOR UPDATE SKIP LOCKED for Job Queues
Instead of explicit locks, use database constraints and SELECT FOR UPDATE.
class JobQueue {
async dequeueJob(workerId: string): Promise<Job | null> {
const result = await this.db.query(
`UPDATE jobs
SET worker_id = $1, status = 'processing', claimed_at = NOW()
WHERE id = (
SELECT id FROM jobs
WHERE status = 'pending'
ORDER BY priority DESC, created_at ASC
FOR UPDATE SKIP LOCKED
LIMIT 1
)
RETURNING id, payload, attempt_count`,
[workerId]
);
return result.rows[0] || null;
}
async completeJob(jobId: string): Promise<void> {
await this.db.query(
`UPDATE jobs SET status = 'completed', completed_at = NOW() WHERE id = $1`,
[jobId]
);
}
async failJob(jobId: string, error: string): Promise<void> {
const result = await this.db.query(
`SELECT attempt_count, max_attempts FROM jobs WHERE id = $1`,
[jobId]
);
const { attempt_count, max_attempts } = result.rows[0];
if (attempt_count >= max_attempts) {
await this.db.query(
`UPDATE jobs SET status = 'failed', error = $1 WHERE id = $2`,
[error, jobId]
);
} else {
await this.db.query(
`UPDATE jobs SET status = 'pending', attempt_count = attempt_count + 1 WHERE id = $1`,
[jobId]
);
}
}
}
Lock-Free Alternatives: Optimistic Concurrency
Often you don't need locks. Use version numbers and retry on conflict.
class OptimisticConcurrency {
async updateAccount(accountId: string, updates: any): Promise<void> {
let retries = 0;
const maxRetries = 5;
while (retries < maxRetries) {
// Read current version
const current = await this.db.query(
`SELECT version, balance FROM accounts WHERE id = $1`,
[accountId]
);
if (current.rows.length === 0) {
throw new Error(`Account ${accountId} not found`);
}
const { version, balance } = current.rows[0];
// Apply updates
const newBalance = balance + updates.amount;
// Try to update only if version hasn't changed
const updateResult = await this.db.query(
`UPDATE accounts SET balance = $1, version = version + 1
WHERE id = $2 AND version = $3`,
[newBalance, accountId, version]
);
if (updateResult.rowCount === 1) {
return; // Success
}
// Version mismatch; retry
retries++;
await sleep(Math.pow(2, retries) * 10 + Math.random() * 10);
}
throw new Error(`Failed to update account after ${maxRetries} retries`);
}
}
When Distributed Locks Are Wrong
Distributed locks add latency and complexity. Consider alternatives:
- Caching: Use cache-aside pattern with weak consistency
- Partitioning: Ensure no two servers handle the same resource
- Optimistic concurrency: Accept conflicts, retry on mismatch
- Work stealing: ONE server owns a resource; if it dies, another steals work
- Sharding: Route by customer/resource to a single owner
Checklist
- Understand your locking requirement: are you preventing races or just coordinating?
- Use database locks for single-database operations
- Use optimistic concurrency unless strong mutual exclusion is required
- If using Redis locks, implement token-based release (not just TTL)
- For long operations, implement renewal or use fencing tokens
- Test lock failures: crashed processes, network partitions, clock skew
- Monitor lock contention and timeout rates
- Avoid nested locks (deadlock risk)
Conclusion
Distributed locks feel like a safety net, but they have holes. For single-database operations, use advisory locks. For cross-service coordination, ask if you really need mutual exclusion—often optimistic concurrency or work partitioning is simpler and faster. If you do use locks, include tokens and test failure scenarios ruthlessly.