Published on

Synchronous Calls Everywhere — When Your Architecture Can't Handle Failure

Authors

Introduction

Synchronous HTTP calls are easy to write and easy to reason about locally. The problem appears at scale: when Service A calls Service B synchronously, A's availability becomes dependent on B's availability. Chain five synchronous calls together and your system's uptime is the product of all five — 99.9% × 99.9% × 99.9% × 99.9% × 99.9% = 99.5%. That's 43 hours of downtime per year from services that are individually 99.9% available.

The Availability Math

Single service uptime: 99.9% = 8.7 hours downtime/year

Chain of synchronous services (multiply uptime):
  2 services: 99.9% × 99.9% = 99.8%  (17.5 hrs downtime/year)
  3 services: 99.9%^3      = 99.7%  (26.3 hrs)
  5 services: 99.9%^5      = 99.5%  (43.8 hrs)
  10 services: 99.9%^10    = 99.0%  (87.6 hrs = 3.6 days!)

This is why synchronous coupling is dangerous — each hop multiplies the failure probability.

Identifying Operations That Should Be Async

// Ask: "Does the user need this result RIGHT NOW to continue?"

// ✅ Synchronous is appropriate:
// - User authenticates → needs JWT to proceed (synchronous)
// - User queries their balance → needs current value (synchronous)
// - User submits order → needs order ID for confirmation page (synchronous)

// ❌ Synchronous where async would be better:
// - Send confirmation email (user doesn't wait for email delivery)
// - Update search index (user doesn't need search updated now)
// - Generate invoice PDF (user gets email when ready)
// - Notify third-party webhooks (fire and forget)
// - Update analytics (user doesn't care about analytics updating)
// - Sync to CRM (background job, not user-facing)

Fix 1: Move Non-Critical Operations to a Queue

import { Queue } from 'bull'

const emailQueue = new Queue('emails', { redis: process.env.REDIS_URL })
const analyticsQueue = new Queue('analytics', { redis: process.env.REDIS_URL })
const webhookQueue = new Queue('webhooks', { redis: process.env.REDIS_URL })

// ❌ Before: synchronous chain blocks user response
async function createOrder(data: CreateOrderDto) {
  const order = await db.createOrder(data)
  await emailService.sendConfirmation(order)      // 200ms — blocks
  await analyticsService.trackPurchase(order)     // 150ms — blocks
  await webhookService.notifyIntegrations(order)  // 300ms — blocks
  return order  // User waits 650ms+ for non-critical operations
}

// ✅ After: queue non-critical work
async function createOrder(data: CreateOrderDto) {
  const order = await db.createOrder(data)

  // Fire and forget — non-critical operations go to queues
  await Promise.all([
    emailQueue.add('order-confirmation', { orderId: order.id }),
    analyticsQueue.add('purchase', { orderId: order.id }),
    webhookQueue.add('order-created', { orderId: order.id }),
  ])

  return order  // User gets response in ~20ms (just the DB write)
}

// Workers handle the async work
emailQueue.process('order-confirmation', async (job) => {
  const order = await db.getOrder(job.data.orderId)
  await emailService.sendConfirmation(order)
})

Fix 2: Saga Pattern for Multi-Step Transactions

// For complex multi-step operations that need coordination:
// Use a saga — a sequence of local transactions linked by events

class CreateOrderSaga {
  async start(data: CreateOrderDto) {
    const sagaId = uuid()

    // Step 1: Create order (compensating action: cancel order)
    const order = await orderService.createPending(data, sagaId)

    // Publish event to start next step
    await eventBus.publish('order.created', { orderId: order.id, sagaId })
    return order  // Return immediately — saga continues asynchronously
  }
}

// Each step subscribes to the previous step's event
eventBus.on('order.created', async ({ orderId, sagaId }) => {
  try {
    await inventoryService.reserve(orderId)
    await eventBus.publish('inventory.reserved', { orderId, sagaId })
  } catch {
    // Compensate: cancel the order
    await eventBus.publish('saga.compensate', { sagaId, step: 'inventory' })
  }
})

eventBus.on('inventory.reserved', async ({ orderId, sagaId }) => {
  try {
    await paymentService.charge(orderId)
    await eventBus.publish('payment.completed', { orderId, sagaId })
  } catch {
    await eventBus.publish('saga.compensate', { sagaId, step: 'payment' })
  }
})

Fix 3: Bulkhead Pattern — Isolate Failures

// Separate thread pools per downstream service so one slow service
// doesn't exhaust threads needed for other services

import { Pool } from 'generic-pool'

// Each service gets its own connection pool with bounded concurrency
const userServicePool = createPool({
  create: () => new UserServiceClient(),
  destroy: (c) => c.close(),
  max: 10,  // max 10 concurrent calls to user-service
  min: 2,
})

const paymentServicePool = createPool({
  max: 5,   // payment is more sensitive — limit concurrency
  min: 1,
})

// Now payment slowness only affects the 5 payment threads
// Other 10 user-service threads are unaffected

Sync vs Async Decision Tree

Q: Does the user need the result to continue their task?
  YESSynchronous call is appropriate
  NOUse a queue

Q: Can we accept eventual consistency?
  YESAsync event
  NOSynchronous with circuit breaker + fallback

Q: Is this operation critical to the main transaction?
  YESInclude in same transaction or use saga
  NOQueue it as a side effect

Conclusion

Synchronous calls are appropriate when the user genuinely needs the result to continue. For everything else — emails, notifications, analytics, webhooks, search indexing, third-party integrations — a queue is better. The user gets a faster response, your system is more resilient (a slow email provider can't slow down checkout), and failed operations can be retried automatically. Start by auditing every synchronous call in your critical path and asking "does the user actually need this right now to continue?" For most side effects, the answer is no.