- Published on
Durable Execution With Temporal — Replacing Fragile Job Queues and Cron Jobs
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
Job queues are fragile. Cron jobs fail silently. Message queues lose data on network partition. Temporal solves this with durable execution: workflows that survive failures, automatic retry policies, saga pattern support for distributed transactions, and human-in-the-loop workflows that span weeks. This post explores where Temporal shines and when simpler solutions suffice.
- What Makes Execution Durable
- Temporal Workflow vs Activity
- Automatic Retry with Exponential Backoff
- Long-Running Workflows (Days/Weeks)
- Saga Pattern for Distributed Transactions
- Signal and Query
- Temporal vs BullMQ vs Inngest
- Production Temporal Deployment Checklist
- Conclusion
What Makes Execution Durable
Durability means: if something fails, it can resume without losing state. Traditional job queues fail at this:
Traditional queue (fragile):
// BullMQ job
const job = await queue.add('process_payment', {
orderId: '123',
amount: 100,
});
// If worker crashes after step 1:
// - Job lost (unless persisted)
// - Partial state unknown
// - No automatic recovery
// - Manual intervention required
async function processPayment(data) {
const order = await db.getOrder(data.orderId); // Step 1
const charge = await stripe.charge(order); // Step 2 — crash here
await db.updateOrder(order.id, 'charged'); // Step 3
await email.send(order.email); // Step 4
}
Temporal (durable):
// Temporal Workflow
export async function paymentWorkflow(orderId: string, amount: number) {
// Workflow code is replayed on every step
// State is deterministic and versioned
const order = await activities.fetchOrder(orderId); // Step 1
const charge = await activities.chargeCard(order); // Step 2 — crash here
await activities.updateOrder(order.id); // Step 3 — automatically retried
await activities.sendEmail(order.email); // Step 4
}
// If worker crashes after step 2:
// - Temporal replays steps 1-2 (deterministic)
// - Resumes at step 3
// - No data loss
// - No manual intervention
// - Automatic retries with exponential backoff
Why durable > fragile:
Scenario | Traditional Queue | Temporal
──────────────────────┼──────────────────┼──────────────────
Worker crash mid-job | Data lost | Resumable
Network partition | Message lost | Retried automatically
Long-running ops (7d) | Timeout/loss | Persisted state
Partial failure | Inconsistent | Transactional
Human approval needed | Manual resume | Built-in signals
──────────────────────┴──────────────────┴──────────────────
Temporal Workflow vs Activity
Temporal separates orchestration (workflow) from execution (activity):
Workflow: Orchestration (durable)
// Workflows must be deterministic
// - Same inputs always produce same sequence of activity calls
// - Cannot have randomness, current time, or side effects
// - Replayed to rebuild state
export async function paymentWorkflow(
orderId: string,
amount: number,
) {
// Workflow: purely orchestrates activities
const order = await activities.fetchOrder(orderId);
if (!order) {
return { success: false, reason: 'Order not found' };
}
// Must be deterministic
// ✗ const random = Math.random();
// ✗ const now = Date.now();
// ✗ const env = process.env.SECRET;
try {
const charge = await activities.chargeCard(order, amount);
if (!charge.success) {
await activities.sendFailureEmail(order.email);
return { success: false, reason: 'Charge failed' };
}
await activities.updateOrderStatus(orderId, 'charged');
await activities.sendSuccessEmail(order.email);
return { success: true, chargeId: charge.id };
} catch (error) {
// Errors are handled, retried, or escalated
await activities.logError(error);
throw error;
}
}
Activity: Execution (retryable)
// Activities are where side effects happen
// - Can fail and be retried
// - Network calls, database writes, external APIs
// - Not replayed (executed once per attempt)
const activities = {
async fetchOrder(orderId: string) {
return await db.query(`SELECT * FROM orders WHERE id = ?`, [orderId]);
},
async chargeCard(order: Order, amount: number) {
// Retry on network failure (up to 3 times)
try {
return await stripe.charges.create({
amount,
currency: 'usd',
customer: order.customerId,
});
} catch (error) {
if (error.code === 'timeout') {
throw error; // Temporal retries automatically
}
throw new NonRetryableError('Card declined');
}
},
async updateOrderStatus(orderId: string, status: string) {
return await db.query(`UPDATE orders SET status = ? WHERE id = ?`, [
status,
orderId,
]);
},
async sendEmail(email: string) {
return await sendgrid.send({
to: email,
subject: 'Payment confirmed',
});
},
};
Key difference:
Workflow | Activity
──────────────────────┼────────────────────
Deterministic logic | Actual side effects
Replayed on restart | Executed once
No I/O directly | Network/DB calls
Orchestrates flow | Does work
Cannot fail silently | Failures retried
Automatic Retry with Exponential Backoff
Temporal handles retries automatically:
import { defineActivity, defineWorkflow, RetryPolicy } from '@temporalio/workflow';
import * as activities from './activities';
// Define retry policy
const retryPolicy: RetryPolicy = {
initialInterval: '1s',
maximumInterval: '10m',
backoffCoefficient: 2.0,
maximumAttempts: 10,
nonRetryableErrorTypes: ['NonRetryableError'],
};
export async function paymentWorkflow(orderId: string, amount: number) {
// Create activity client with retry policy
const options = { retry: retryPolicy };
const order = await executeActivity(activities.fetchOrder, orderId, options);
// Automatically retried:
// Attempt 1: fails immediately
// Attempt 2: wait 1s, retry
// Attempt 3: wait 2s, retry
// Attempt 4: wait 4s, retry
// Attempt 5: wait 8s, retry
// Attempt 6: wait 10m (capped), retry
// ...up to 10 total attempts
const charge = await executeActivity(
activities.chargeCard,
order,
amount,
{
retry: {
...retryPolicy,
nonRetryableErrorTypes: ['CardDeclined', 'InsufficientFunds'],
},
},
);
await executeActivity(
activities.updateOrderStatus,
orderId,
'charged',
options,
);
}
Comparison to BullMQ:
// BullMQ: Manual retry configuration
const job = await queue.add(
'charge',
{ orderId, amount },
{
attempts: 5,
backoff: {
type: 'exponential',
delay: 2000, // Must set manually
},
},
);
// Temporal: Automatic, type-safe
const retryPolicy = {
initialInterval: '2s',
backoffCoefficient: 2,
maximumAttempts: 5,
};
// Applied consistently across all activities
Long-Running Workflows (Days/Weeks)
Temporal was designed for workflows spanning days, weeks, or months:
Multi-month approval workflow:
// Loan approval: spans weeks
export async function loanApprovalWorkflow(
applicationId: string,
amount: number,
) {
// Step 1: Auto-approve small amounts
if (amount < 5000) {
await activities.approveLoan(applicationId);
return { approved: true, reason: 'auto-approved' };
}
// Step 2: Request manual review (human approval)
const approval = await condition(
() => state.approved !== undefined,
'14d', // Wait up to 14 days
);
if (!approval.approved) {
await activities.denyLoan(applicationId);
return { approved: false, reason: 'Denied by reviewer' };
}
// Step 3: Verify document (this takes days)
const verification = await activities.verifyDocuments(applicationId);
// Step 4: Final approval
await activities.approveLoan(applicationId, verification);
// Step 5: Disburse funds
await activities.disburseAmount(applicationId, amount);
return { approved: true, disbursed: true };
}
// If server crashes:
// - Workflow state persisted
// - On restart, resume from step where it left off
// - No data loss, no timeout
// - Can run indefinitely
Human-in-the-loop signal handling:
// Signal: external input during workflow execution
export const signals = {
approveApplication: createSignalHandler(
'approve',
(approved: boolean, reason: string) => {
state.approved = approved;
state.approvalReason = reason;
},
),
};
// External system sends signal
await client.signalWorkflow(workflowId, 'approve', [true, 'Verified']);
// Workflow resumes from condition() in approvalWorkflow
// No polling, no timeouts, no lost updates
Saga Pattern for Distributed Transactions
Sagas coordinate changes across multiple services:
// Booking workflow: reserve flight, hotel, rental car
export async function bookingWorkflow(
userId: string,
flightId: string,
hotelId: string,
carId: string,
) {
const compensations: (() => Promise<void>)[] = [];
try {
// Step 1: Reserve flight
const flightReservation = await activities.reserveFlight(flightId, userId);
compensations.push(() => activities.cancelFlight(flightReservation.id));
// Step 2: Reserve hotel
const hotelReservation = await activities.reserveHotel(hotelId, userId);
compensations.push(() => activities.cancelHotel(hotelReservation.id));
// Step 3: Reserve car (might fail)
const carReservation = await activities.reserveCar(carId, userId);
compensations.push(() => activities.cancelCar(carReservation.id));
// All succeeded; charge customer
await activities.chargeCard(userId, 1500); // $1500 total
return {
success: true,
reservations: {
flight: flightReservation,
hotel: hotelReservation,
car: carReservation,
},
};
} catch (error) {
// One activity failed; execute compensations in reverse order
console.log(`Booking failed: ${error}. Rolling back...`);
for (let i = compensations.length - 1; i >= 0; i--) {
try {
await compensations[i]();
} catch (compensationError) {
console.error(`Compensation failed: ${compensationError}`);
// Log for manual intervention
await activities.logCompensationFailure(userId, compensationError);
}
}
throw error;
}
}
// Properties:
// - All succeed or all fail (transactional)
// - No orphaned reservations
// - Automatic rollback on any step failure
// - Compensations retried if they fail
Signal and Query
Workflows are interactive:
Signal: Send data to workflow
export const signals = {
pause: createSignalHandler('pause', () => {
state.paused = true;
}),
resume: createSignalHandler('resume', () => {
state.paused = false;
}),
cancel: createSignalHandler('cancel', () => {
state.cancelled = true;
}),
};
export async function dataProcessingWorkflow(
datasetId: string,
) {
for (let batch = 0; batch < 100; batch++) {
// Check for pause signal
if (state.paused) {
await condition(() => !state.paused); // Wait for resume
}
if (state.cancelled) {
return { cancelled: true, processedBatches: batch };
}
await activities.processBatch(datasetId, batch);
}
return { completed: true, processedBatches: 100 };
}
// External control
const client = new WorkflowClient();
await client.signalWorkflow(workflowId, 'pause', []);
// ... do something ...
await client.signalWorkflow(workflowId, 'resume', []);
Query: Read workflow state
export const queries = {
getStatus: createQueryHandler('status', () => {
return {
state: state.paused ? 'paused' : 'running',
progress: state.progress,
completedBatches: state.batch,
};
}),
};
// External system queries status
const status = await client.queryWorkflow(
{ workflowId },
'status',
[],
);
console.log(`Workflow progress: ${status.completedBatches}/100`);
Temporal vs BullMQ vs Inngest
Temporal: Durable workflows
Strengths:
- Long-running workflows (days/weeks/months)
- Complex logic, saga pattern
- Guaranteed no message loss
- Self-healing on failures
- Scalable (millions of workflows)
Weaknesses:
- Learning curve (determinism)
- Infrastructure: Temporal server required
- Overkill for simple jobs
- Not ideal for real-time jobs
Use for: Payment processing, approvals, data pipelines
BullMQ: Simple job queue
const queue = new Queue('tasks', { connection: redis });
await queue.add('send-email', {
to: 'user@example.com',
subject: 'Hello',
});
// Handler
queue.process('send-email', async (job) => {
await sendEmail(job.data.to, job.data.subject);
});
// Strengths: Simple, Redis-backed, good for simple jobs
// Weaknesses: No durable state, polling overhead, single Redis risk
// Use for: Background jobs, notifications, simple async work
Inngest: Serverless durable execution
// Inngest: Managed Temporal alternative
const inngest = new Inngest({ id: 'my-app' });
export const sendWelcomeEmail = inngest.createFunction(
{ id: 'send-welcome-email' },
{ event: 'user.created' },
async ({ event, step }) => {
const user = await step.run('fetch-user', async () => {
return await db.users.get(event.data.userId);
});
await step.run('send-email', async () => {
await email.send(user.email, 'Welcome!');
});
return { sent: true };
},
);
// Strengths: Managed, no infrastructure, simple API
// Weaknesses: Vendor lock-in, pricing per execution
// Use for: Serverless, no ops expertise, small-to-medium scale
Comparison table:
Metric | Temporal | BullMQ | Inngest
──────────────────┼─────────────────┼────────────────┼──────────────
Long workflows | ✓ (unlimited) | ✗ (hours max) | ✓ (unlimited)
Durability | ✓ (guaranteed) | ✗ (unreliable) | ✓ (guaranteed)
Infrastructure | Self-hosted | Redis | Managed
Complexity | High | Low | Low
Cost at scale | Low | Medium | High
Saga pattern | ✓ Built-in | ✗ Manual | ✓ Built-in
Human approval | ✓ Signals | ✗ Custom | ✓ Built-in
──────────────────┴─────────────────┴────────────────┴──────────────
Production Temporal Deployment Checklist
# deployment/temporal-readiness.yaml
infrastructure:
- "✓ Temporal server deployed (3+ replicas)"
- "✓ Persistent storage (PostgreSQL or Cassandra)"
- "✓ Worker processes scalable"
- "✓ Temporal Web UI accessible"
application:
- "✓ Workflows deterministic (no randomness)"
- "✓ Activities properly scoped"
- "✓ Retry policies defined"
- "✓ Error handling comprehensive"
- "✓ Signals and queries tested"
monitoring:
- "✓ Workflow execution metrics collected"
- "✓ Failed activities alerted"
- "✓ Stuck workflows detected"
- "✓ Latency percentiles tracked (P50/P99)"
operations:
- "✓ Runbook for stuck workflows"
- "✓ Replay testing documented"
- "✓ Worker scaling policy"
- "✓ Disaster recovery tested"
Conclusion
Temporal solves problems that have plagued background job systems for decades. Long-running workflows survive failures. Automatic retry policies handle transient errors. Saga patterns coordinate distributed transactions safely. Signals enable human-in-the-loop workflows. The learning curve (determinism, replay semantics) is real, but the payoff—never losing work, never having orphaned state, never manually intervening to fix partial failures—is worth it. For simple background jobs, BullMQ or Inngest suffice. For complex, long-running, or mission-critical workflows, Temporal's durability guarantees are invaluable.