- Published on
Synchronous Calls Everywhere — When Your Architecture Can't Handle Failure
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
Synchronous HTTP calls are easy to write and easy to reason about locally. The problem appears at scale: when Service A calls Service B synchronously, A's availability becomes dependent on B's availability. Chain five synchronous calls together and your system's uptime is the product of all five — 99.9% × 99.9% × 99.9% × 99.9% × 99.9% = 99.5%. That's 43 hours of downtime per year from services that are individually 99.9% available.
- The Availability Math
- Identifying Operations That Should Be Async
- Fix 1: Move Non-Critical Operations to a Queue
- Fix 2: Saga Pattern for Multi-Step Transactions
- Fix 3: Bulkhead Pattern — Isolate Failures
- Sync vs Async Decision Tree
- Conclusion
The Availability Math
Single service uptime: 99.9% = 8.7 hours downtime/year
Chain of synchronous services (multiply uptime):
2 services: 99.9% × 99.9% = 99.8% (17.5 hrs downtime/year)
3 services: 99.9%^3 = 99.7% (26.3 hrs)
5 services: 99.9%^5 = 99.5% (43.8 hrs)
10 services: 99.9%^10 = 99.0% (87.6 hrs = 3.6 days!)
This is why synchronous coupling is dangerous — each hop multiplies the failure probability.
Identifying Operations That Should Be Async
// Ask: "Does the user need this result RIGHT NOW to continue?"
// ✅ Synchronous is appropriate:
// - User authenticates → needs JWT to proceed (synchronous)
// - User queries their balance → needs current value (synchronous)
// - User submits order → needs order ID for confirmation page (synchronous)
// ❌ Synchronous where async would be better:
// - Send confirmation email (user doesn't wait for email delivery)
// - Update search index (user doesn't need search updated now)
// - Generate invoice PDF (user gets email when ready)
// - Notify third-party webhooks (fire and forget)
// - Update analytics (user doesn't care about analytics updating)
// - Sync to CRM (background job, not user-facing)
Fix 1: Move Non-Critical Operations to a Queue
import { Queue } from 'bull'
const emailQueue = new Queue('emails', { redis: process.env.REDIS_URL })
const analyticsQueue = new Queue('analytics', { redis: process.env.REDIS_URL })
const webhookQueue = new Queue('webhooks', { redis: process.env.REDIS_URL })
// ❌ Before: synchronous chain blocks user response
async function createOrder(data: CreateOrderDto) {
const order = await db.createOrder(data)
await emailService.sendConfirmation(order) // 200ms — blocks
await analyticsService.trackPurchase(order) // 150ms — blocks
await webhookService.notifyIntegrations(order) // 300ms — blocks
return order // User waits 650ms+ for non-critical operations
}
// ✅ After: queue non-critical work
async function createOrder(data: CreateOrderDto) {
const order = await db.createOrder(data)
// Fire and forget — non-critical operations go to queues
await Promise.all([
emailQueue.add('order-confirmation', { orderId: order.id }),
analyticsQueue.add('purchase', { orderId: order.id }),
webhookQueue.add('order-created', { orderId: order.id }),
])
return order // User gets response in ~20ms (just the DB write)
}
// Workers handle the async work
emailQueue.process('order-confirmation', async (job) => {
const order = await db.getOrder(job.data.orderId)
await emailService.sendConfirmation(order)
})
Fix 2: Saga Pattern for Multi-Step Transactions
// For complex multi-step operations that need coordination:
// Use a saga — a sequence of local transactions linked by events
class CreateOrderSaga {
async start(data: CreateOrderDto) {
const sagaId = uuid()
// Step 1: Create order (compensating action: cancel order)
const order = await orderService.createPending(data, sagaId)
// Publish event to start next step
await eventBus.publish('order.created', { orderId: order.id, sagaId })
return order // Return immediately — saga continues asynchronously
}
}
// Each step subscribes to the previous step's event
eventBus.on('order.created', async ({ orderId, sagaId }) => {
try {
await inventoryService.reserve(orderId)
await eventBus.publish('inventory.reserved', { orderId, sagaId })
} catch {
// Compensate: cancel the order
await eventBus.publish('saga.compensate', { sagaId, step: 'inventory' })
}
})
eventBus.on('inventory.reserved', async ({ orderId, sagaId }) => {
try {
await paymentService.charge(orderId)
await eventBus.publish('payment.completed', { orderId, sagaId })
} catch {
await eventBus.publish('saga.compensate', { sagaId, step: 'payment' })
}
})
Fix 3: Bulkhead Pattern — Isolate Failures
// Separate thread pools per downstream service so one slow service
// doesn't exhaust threads needed for other services
import { Pool } from 'generic-pool'
// Each service gets its own connection pool with bounded concurrency
const userServicePool = createPool({
create: () => new UserServiceClient(),
destroy: (c) => c.close(),
max: 10, // max 10 concurrent calls to user-service
min: 2,
})
const paymentServicePool = createPool({
max: 5, // payment is more sensitive — limit concurrency
min: 1,
})
// Now payment slowness only affects the 5 payment threads
// Other 10 user-service threads are unaffected
Sync vs Async Decision Tree
Q: Does the user need the result to continue their task?
YES → Synchronous call is appropriate
NO → Use a queue
Q: Can we accept eventual consistency?
YES → Async event
NO → Synchronous with circuit breaker + fallback
Q: Is this operation critical to the main transaction?
YES → Include in same transaction or use saga
NO → Queue it as a side effect
Conclusion
Synchronous calls are appropriate when the user genuinely needs the result to continue. For everything else — emails, notifications, analytics, webhooks, search indexing, third-party integrations — a queue is better. The user gets a faster response, your system is more resilient (a slow email provider can't slow down checkout), and failed operations can be retried automatically. Start by auditing every synchronous call in your critical path and asking "does the user actually need this right now to continue?" For most side effects, the answer is no.