- Published on
Cron Job Running Twice — When Your Scheduled Job Has Duplicate Instances
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
Node-cron, cron-job.org, Kubernetes CronJobs — all have the same problem when you run multiple instances: every instance runs the job. If your cron sends emails, charges subscriptions, or generates reports, running it 3x causes real damage.
- The Problem
- Fix 1: Distributed Lock with Redis
- Fix 2: Bull Queue for Scheduled Jobs (Recommended)
- Fix 3: Kubernetes CronJob (One Job at a Time)
- Fix 4: Database Advisory Locks
- Fix 5: Leader Election for Master Instance
- Cron Safety Checklist
- Conclusion
The Problem
App deployed on 3 Kubernetes pods
Each pod runs node-cron: '0 0 * * *' (daily at midnight)
00:00:00 Pod 1: starts billing job
00:00:00 Pod 2: starts billing job (DUPLICATE!)
00:00:00 Pod 3: starts billing job (DUPLICATE!)
All 3 pods query unpaid invoices → all see the same 500 invoices
All 3 charge customers → 3x charges
All 3 send receipts → 3x emails
Fix 1: Distributed Lock with Redis
import { Redis } from 'ioredis'
import cron from 'node-cron'
class DistributedCron {
constructor(private redis: Redis, private instanceId: string = process.env.HOSTNAME ?? 'unknown') {}
schedule(name: string, cronExpression: string, handler: () => Promise<void>) {
cron.schedule(cronExpression, async () => {
const lockKey = `cron:lock:${name}`
const lockValue = this.instanceId
const lockTTLSeconds = 300 // 5 minutes max job duration
// Try to acquire lock — only one instance will succeed
const acquired = await this.redis.set(
lockKey,
lockValue,
'EX',
lockTTLSeconds,
'NX' // Only set if not exists
)
if (!acquired) {
console.log(`[Cron] ${name} already running on another instance — skipping`)
return
}
const startTime = Date.now()
console.log(`[Cron] ${name} started on ${this.instanceId}`)
try {
await handler()
console.log(`[Cron] ${name} completed in ${Date.now() - startTime}ms`)
} catch (err) {
console.error(`[Cron] ${name} failed:`, err)
throw err
} finally {
// Release lock only if we own it
await this.redis.eval(`
if redis.call('get', KEYS[1]) == ARGV[1] then
return redis.call('del', KEYS[1])
else
return 0
end
`, 1, lockKey, lockValue)
}
})
}
}
// Usage
const distributedCron = new DistributedCron(redis)
distributedCron.schedule('daily-billing', '0 0 * * *', async () => {
const invoices = await db.invoice.findUnpaid()
for (const invoice of invoices) {
await stripe.charge(invoice)
await emailService.sendReceipt(invoice)
}
})
Fix 2: Bull Queue for Scheduled Jobs (Recommended)
import Bull from 'bull'
import cron from 'node-cron'
// Bull ensures only one instance processes a job at a time
const billingQueue = new Bull('billing', {
redis: { host: 'localhost', port: 6379 },
})
// Only one instance adds the job (using Redis lock)
// All instances can process it — but Bull ensures only one does
billingQueue.process('monthly-billing', 1, async (job) => {
console.log(`Processing billing for month ${job.data.month}`)
const invoices = await db.invoice.findUnpaidForMonth(job.data.month)
for (const invoice of invoices) {
await processInvoice(invoice)
await job.progress(Math.round((invoices.indexOf(invoice) / invoices.length) * 100))
}
})
// Schedule: only add job if not already queued
cron.schedule('0 1 * * *', async () => {
const existing = await billingQueue.getJobs(['waiting', 'active', 'delayed'])
const todaysBilling = existing.find(j =>
j.name === 'monthly-billing' &&
new Date(j.data.scheduledAt).toDateString() === new Date().toDateString()
)
if (!todaysBilling) {
await billingQueue.add('monthly-billing', {
month: new Date().toISOString(),
scheduledAt: Date.now(),
})
}
})
Fix 3: Kubernetes CronJob (One Job at a Time)
# Instead of running cron inside your app, use Kubernetes CronJob
# K8s CronJob spins up a single pod for each run — no duplicates
apiVersion: batch/v1
kind: CronJob
metadata:
name: daily-billing
spec:
schedule: "0 0 * * *"
concurrencyPolicy: Forbid # Don't start if previous run still active
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
backoffLimit: 2 # Retry failed jobs up to 2 times
activeDeadlineSeconds: 3600 # Kill job if it runs > 1 hour
template:
spec:
restartPolicy: OnFailure
containers:
- name: billing
image: my-app:latest
command: ["node", "scripts/run-billing.js"]
env:
- name: JOB_TYPE
value: "billing"
// run-billing.js — standalone script (not long-running server)
async function main() {
console.log('Starting billing job')
await db.connect()
const invoices = await db.invoice.findUnpaid()
console.log(`Processing ${invoices.length} invoices`)
for (const invoice of invoices) {
await processInvoice(invoice)
}
console.log('Billing job complete')
process.exit(0) // Exit when done — pod terminates
}
main().catch((err) => {
console.error('Billing job failed:', err)
process.exit(1) // Non-zero exit → Kubernetes marks job as failed → retries
})
Fix 4: Database Advisory Locks
// PostgreSQL advisory locks — no Redis required
async function runWithAdvisoryLock(lockId: number, fn: () => Promise<void>) {
const client = await db.connect()
try {
// Try to acquire session-level advisory lock
// Returns true if acquired, false if already locked
const acquired = await client.query(
'SELECT pg_try_advisory_lock($1)',
[lockId]
)
if (!acquired.rows[0].pg_try_advisory_lock) {
console.log(`Advisory lock ${lockId} already held — skipping`)
return
}
await fn()
} finally {
// Lock is released automatically when session ends
// Or release manually:
await client.query('SELECT pg_advisory_unlock($1)', [lockId])
client.release()
}
}
// Different jobs get different lock IDs
const LOCK_IDS = {
DAILY_BILLING: 1001,
WEEKLY_REPORT: 1002,
CLEANUP: 1003,
}
cron.schedule('0 0 * * *', () => {
runWithAdvisoryLock(LOCK_IDS.DAILY_BILLING, async () => {
await processDailyBilling()
})
})
Fix 5: Leader Election for Master Instance
// Elect one instance as "master" — only master runs cron jobs
class LeaderElection {
private isLeader = false
private leaseKey = 'cron:leader'
private leaseTTL = 30 // 30 seconds
private renewInterval: NodeJS.Timeout | null = null
constructor(
private redis: Redis,
private instanceId: string = process.env.HOSTNAME ?? 'pod-' + Math.random()
) {}
async start(): Promise<void> {
await this.tryBecomeLeader()
// Periodically try to become leader / renew lease
this.renewInterval = setInterval(() => this.tryBecomeLeader(), 10_000)
}
private async tryBecomeLeader(): Promise<void> {
if (this.isLeader) {
// Already leader — renew lease
await this.redis.expire(this.leaseKey, this.leaseTTL)
return
}
// Try to acquire leader lease
const acquired = await this.redis.set(
this.leaseKey,
this.instanceId,
'EX',
this.leaseTTL,
'NX'
)
if (acquired) {
this.isLeader = true
console.log(`[Leader] ${this.instanceId} is now leader`)
}
}
getIsLeader(): boolean { return this.isLeader }
stop(): void {
if (this.renewInterval) clearInterval(this.renewInterval)
}
}
const election = new LeaderElection(redis)
await election.start()
// Only leader schedules and runs cron jobs
cron.schedule('0 0 * * *', async () => {
if (!election.getIsLeader()) {
console.log('Not leader — skipping cron')
return
}
await processDailyBilling()
})
Cron Safety Checklist
| Risk | Solution |
|---|---|
| Multiple instances run same job | Redis distributed lock or leader election |
| Job takes longer than cron interval | concurrencyPolicy: Forbid in K8s, or lock TTL > max job duration |
| Job fails — no retry | Bull queue with retry, or K8s backoffLimit |
| Long-running job doesn't finish | activeDeadlineSeconds, lock timeout, progress tracking |
| Lock holder crashes — lock never released | Lock TTL ensures auto-expiry |
Conclusion
Running cron in a multi-instance deployment without distributed locking is a guaranteed bug. The simplest fix is a Redis distributed lock: the first instance to set NX runs the job, others skip. For complex workflows, use Bull queue to decouple scheduling from execution — any instance can process the job but only one will at a time. For clean separation, move cron jobs out of the app entirely and into Kubernetes CronJobs with concurrencyPolicy: Forbid. Whatever approach you choose, the lock TTL must be longer than the maximum expected job duration — otherwise the lock expires while the job is still running, enabling a second instance to start.