- Published on
Thundering Herd on Service Restart — The Restart That Kills Your System
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
You deploy a critical hotfix. The pod restarts. The health check passes. Traffic flows in. And then — the new instance dies under a crushing wave of requests that had queued up during the 15-second restart window.
This is the Thundering Herd on Service Restart — a self-inflicted DDoS.
- Why It Happens
- Root Causes
- Fix 1: Slow Start / Traffic Ramping in Load Balancer
- Fix 2: Readiness Probe with Warm-Up
- Fix 3: Request Rate Limiting on Startup
- Fix 4: Graceful Shutdown (Drain Before Restart)
- Fix 5: Circuit Breaker at the Client
- Fix 6: Connection Pool Lazy Initialization
- Kubernetes Rolling Deployment Best Practices
- Monitoring Restart Events
- Conclusion
Why It Happens
During a service restart:
t=0s Service goes down
t=0-15s Requests queue at load balancer, retry logic fires, clients reconnect
Queue builds: 10,000 pending requests...
t=15s Service comes back up — HEALTHY
t=15s ALL 10,000 queued requests hit the fresh instance simultaneously
t=15s CPU 100%, memory spike, DB connections exhausted, service crashes again
t=16s Restart loop begins
The service never gets a chance to warm up. It's crushed before it can handle anything.
Root Causes
- Long restart window — Cold JVM/Node.js startup takes time
- Client retry storms — Clients with exponential backoff all retry at once when the service comes back
- No request queuing — Load balancer dumps everything at once
- No warm-up period — Service is marked healthy before it's actually ready
- Connection pool pre-fill — DB connection pool initializes hundreds of connections simultaneously on boot
Fix 1: Slow Start / Traffic Ramping in Load Balancer
Don't send 100% of traffic immediately — ramp up:
# Nginx upstream slow_start
upstream backend {
server app1:3000 slow_start=30s; # Ramp traffic over 30 seconds
server app2:3000 slow_start=30s;
}
# Kubernetes — progressive traffic via Argo Rollouts
apiVersion: argoproj.io/v1alpha1
kind: Rollout
spec:
strategy:
canary:
steps:
- setWeight: 5 # 5% of traffic first
- pause: { duration: 30s }
- setWeight: 25
- pause: { duration: 30s }
- setWeight: 100
Fix 2: Readiness Probe with Warm-Up
Your health check should only pass once your app is actually ready — connections established, caches pre-warmed:
// Express warm-up before marking as ready
import express from 'express'
const app = express()
let isReady = false
async function warmUp() {
console.log('Warming up...')
// Pre-establish DB connection pool
await db.connect()
// Pre-warm critical caches
await Promise.all([
cache.prefetch('config:global'),
cache.prefetch('feature-flags'),
cache.prefetch('rate-limits'),
])
// Run a test query to ensure DB is responsive
await db.query('SELECT 1')
isReady = true
console.log('Warm-up complete — accepting traffic')
}
// Kubernetes readiness probe
app.get('/ready', (req, res) => {
if (isReady) {
res.status(200).json({ status: 'ready' })
} else {
res.status(503).json({ status: 'warming up' })
}
})
// Kubernetes liveness probe (separate — just "am I alive?")
app.get('/health', (req, res) => {
res.status(200).json({ status: 'alive' })
})
app.listen(3000, async () => {
await warmUp()
})
# kubernetes deployment
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 10 # Wait 10s before first check
periodSeconds: 5
failureThreshold: 6 # 30s to warm up before failing
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
Fix 3: Request Rate Limiting on Startup
Throttle incoming requests during the warm-up window:
import { RateLimiterMemory } from 'rate-limiter-flexible'
let startupLimiter: RateLimiterMemory | null = new RateLimiterMemory({
points: 50, // Only 50 req/s during warm-up
duration: 1,
})
// After 60 seconds, remove the startup limiter
setTimeout(() => {
startupLimiter = null
console.log('Startup rate limit removed — running at full capacity')
}, 60_000)
app.use(async (req, res, next) => {
if (!startupLimiter) return next()
try {
await startupLimiter.consume(req.ip)
next()
} catch {
res.status(503).json({ error: 'Service starting up, please retry' })
}
})
Fix 4: Graceful Shutdown (Drain Before Restart)
Don't crash — finish in-flight requests before restarting:
const server = app.listen(3000)
let isShuttingDown = false
process.on('SIGTERM', async () => {
console.log('SIGTERM received — graceful shutdown starting')
isShuttingDown = true
// 1. Stop accepting new connections
server.close(async () => {
console.log('HTTP server closed')
// 2. Finish in-flight requests (already handled by server.close)
// 3. Close DB connections
await db.end()
console.log('DB connections closed')
process.exit(0)
})
// 4. Force-kill after 30s if graceful drain stalls
setTimeout(() => {
console.error('Graceful shutdown timeout — forcing exit')
process.exit(1)
}, 30_000)
})
// 5. Reject new requests during shutdown
app.use((req, res, next) => {
if (isShuttingDown) {
res.setHeader('Connection', 'close')
return res.status(503).json({ error: 'Service shutting down' })
}
next()
})
Fix 5: Circuit Breaker at the Client
If you control the clients, prevent retry storms with a circuit breaker:
import CircuitBreaker from 'opossum'
const options = {
timeout: 3000,
errorThresholdPercentage: 50,
resetTimeout: 30000,
// Critical: limit concurrent requests during half-open state
volumeThreshold: 10,
}
const breaker = new CircuitBreaker(callDownstreamService, options)
// Half-open: only let 1 request through to test recovery
breaker.on('halfOpen', () => console.log('Circuit half-open — testing recovery'))
breaker.on('close', () => console.log('Circuit closed — service healthy'))
Fix 6: Connection Pool Lazy Initialization
Spread out the DB connection pool initialization:
import { Pool } from 'pg'
const pool = new Pool({
connectionString: process.env.DATABASE_URL,
min: 2, // Start with just 2 connections
max: 20, // Grow to 20 max
// Connections are created on demand, not all at startup
idleTimeoutMillis: 30_000,
})
// Pre-create only minimum connections during warm-up
async function warmUpPool() {
const warmUpConnections = 2
const clients = await Promise.all(
Array.from({ length: warmUpConnections }, () => pool.connect())
)
clients.forEach(c => c.release())
console.log(`Pool pre-warmed with ${warmUpConnections} connections`)
}
Kubernetes Rolling Deployment Best Practices
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # Only 1 new pod at a time
maxUnavailable: 0 # Never take down a pod before new one is ready
template:
spec:
containers:
- name: app
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5"] # Drain before SIGTERM
Monitoring Restart Events
// Track restart-related metrics
const metrics = {
startTime: Date.now(),
warmUpCompleted: false,
uptimeMs: () => Date.now() - metrics.startTime,
isWarm: () => metrics.warmUpCompleted,
}
// Alert if restarts are too frequent
let restartCount = 0
process.on('SIGTERM', () => {
restartCount++
if (restartCount > 3) {
logger.alert('Service restarting frequently — possible thundering herd loop')
}
})
Conclusion
A thundering herd on restart is a restart loop that can take down your entire service. The solutions work in layers: graceful shutdown ensures clean exits, readiness probes prevent premature traffic, slow start ramps up load gradually, and startup rate limiting gives your service room to breathe. Implement all of them for bulletproof deployments.