- Published on
Temporal.io — Durable Workflows That Survive Server Crashes and Network Failures
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
Temporal.io is a workflow orchestration engine that guarantees workflow execution even when your servers crash, your database goes down, or your network flakes. Unlike queues (BullMQ, SQS) that lose messages or require complex retry logic, Temporal executes workflows deterministically, replaying them from state to recover from failures. This post covers Temporal's core concepts, production patterns, and how to architect scalable workflow systems.
- Temporal Core Concepts
- Activity Retries and Timeouts
- Workflow Signals and Queries
- Deterministic Workflow Constraints
- Child Workflows for Orchestration
- Worker Setup in Node.js
- Testing Workflows with TestWorkflowEnvironment
- Temporal vs BullMQ vs SQS Decision Matrix
- Checklist
- Conclusion
Temporal Core Concepts
Temporal's architecture separates concerns into three roles: Workflows (durable state machines), Activities (side effects), and Workers (execution engines). Understanding this separation is crucial.
A Workflow is a deterministic state machine. It must be idempotent and replay-safe because Temporal replays the entire workflow history to recover state. You cannot call Math.random() or Date.now() directly; use Temporal's utilities instead.
An Activity is where side effects happen: API calls, database writes, file uploads. Activities are durable (retried automatically) but not replayed. Each activity execution is a transaction boundary.
A Task Queue is a work distribution channel. Workers poll task queues and execute workflows or activities. Multiple workers can process the same queue for scalability.
// workflow.ts - Deterministic state machine
import { proxyActivities, defineQuery, setHandler, sleep } from '@temporalio/workflow';
import * as activities from './activities';
const { fetchUserData, sendWelcomeEmail, chargeCard } = proxyActivities<typeof activities>({
startToCloseTimeout: '5 minutes',
});
export interface OnboardingInput {
userId: string;
email: string;
}
export async function onboardingWorkflow(input: OnboardingInput): Promise<void> {
// Step 1: Fetch user data (activity with built-in retry)
const userData = await fetchUserData(input.userId);
// Step 2: Wait deterministically
await sleep('2 hours');
// Step 3: Charge card (can fail and be retried transparently)
await chargeCard(userData.paymentId);
// Step 4: Send email
await sendWelcomeEmail(input.email, userData.firstName);
}
// Query: read current workflow state
export const getWorkflowStatus = defineQuery<string>('getStatus');
setHandler(getWorkflowStatus, () => 'completed');
Activity Retries and Timeouts
Activities fail. Temporal's retry policy ensures transient failures don't kill your workflow. You configure:
initialInterval: Start with this delay (e.g., 1 second)maximumInterval: Cap exponential backoff (e.g., 1 minute)backoffCoefficient: Multiply by this each retry (2.0 is standard)maximumAttempts: Max retries before giving up
// activities.ts - Real production activity
import axios from 'axios';
interface PaymentGatewayResponse {
transactionId: string;
status: 'success' | 'failed';
}
export async function chargeCard(paymentId: string, amount: number): Promise<PaymentGatewayResponse> {
try {
const response = await axios.post('https://payment-gateway.example.com/charge', {
paymentId,
amount,
idempotencyKey: `charge-${paymentId}-${Date.now()}`,
}, {
timeout: 10000,
});
return response.data;
} catch (error) {
if (axios.isAxiosError(error)) {
// Idempotent key ensures duplicate charges don't happen on retry
if (error.response?.status === 409) {
throw new Error('Duplicate charge attempt');
}
// Transient errors: Temporal retries automatically
if (error.code === 'ECONNRESET' || error.code === 'ETIMEDOUT') {
throw error;
}
}
throw error;
}
}
export async function fetchUserData(userId: string) {
const response = await axios.get(`https://api.example.com/users/${userId}`, {
timeout: 5000,
});
return response.data;
}
export async function sendWelcomeEmail(email: string, firstName: string): Promise<void> {
// Email service call with idempotent semantics
await axios.post('https://email-service.example.com/send', {
to: email,
template: 'welcome',
context: { firstName },
deduplicationId: `welcome-${email}`,
});
}
Workflow Signals and Queries
Workflows can be interrupted mid-execution by signals. Signals communicate external events: user cancelled subscription, admin paused process, payment received.
Queries let external systems read workflow state without modifying it. Use queries for dashboards, status pages.
// workflow-with-signals.ts
import { proxyActivities, defineSignal, defineQuery, setHandler, sleep } from '@temporalio/workflow';
import * as activities from './activities';
const { sendRefund, logEvent } = proxyActivities<typeof activities>({
startToCloseTimeout: '10 minutes',
});
export interface SubscriptionWorkflowInput {
subscriptionId: string;
monthlyPrice: number;
}
let subscriptionActive = true;
let cancellationReason = '';
// Signal: pause subscription
export const pauseSubscription = defineSignal<[string]>('pauseSubscription');
setHandler(pauseSubscription, (reason: string) => {
subscriptionActive = false;
cancellationReason = reason;
});
// Query: get current status
export const getSubscriptionStatus = defineQuery<{ active: boolean; reason: string }>('getStatus');
setHandler(getSubscriptionStatus, () => ({
active: subscriptionActive,
reason: cancellationReason,
}));
export async function subscriptionWorkflow(input: SubscriptionWorkflowInput): Promise<void> {
let billingCycleCount = 0;
while (subscriptionActive) {
// Wait one month
await sleep('30 days');
if (subscriptionActive) {
billingCycleCount++;
await logEvent('billing-cycle-started', { cycleNumber: billingCycleCount });
}
}
// User paused: refund pro-rata
await sendRefund(input.subscriptionId, input.monthlyPrice);
await logEvent('subscription-cancelled', { reason: cancellationReason });
}
Deterministic Workflow Constraints
Workflows replay from history. This means:
- No side effects in workflow code: Only call activities or use Temporal utilities
- No external time: Use
sleep(), notsetTimeout() - No randomness: Use
workflow.random()if needed (rarely) - No mutable external state: Workflows can't depend on changing globals
Violating these causes non-deterministic replay errors. The fix: move code to an activity.
// WRONG: Non-deterministic workflow
export async function badWorkflow() {
const random = Math.random(); // DON'T DO THIS
const now = new Date(); // DON'T DO THIS
// This will fail on replay because random and now will differ
}
// CORRECT: Deterministic workflow
import { sleep } from '@temporalio/workflow';
export async function goodWorkflow() {
await sleep('1 hour'); // Deterministic timer
// All external calls are activities
}
Child Workflows for Orchestration
For complex workflows with branches or parallel work, spawn child workflows. Each child is independently durable and can be monitored.
// parent-workflow.ts
import { proxyActivities, executeChild } from '@temporalio/workflow';
import { sendEmailWorkflow, processPaymentWorkflow } from './child-workflows';
export interface LargeOrderWorkflowInput {
orderId: string;
items: Array<{ productId: string; quantity: number }>;
customerId: string;
}
export async function largeOrderWorkflow(input: LargeOrderWorkflowInput) {
// Parallel child workflows using Promise.all
const [paymentResult, emailResult] = await Promise.all([
executeChild(processPaymentWorkflow, {
args: [input.orderId, input.customerId],
workflowId: `payment-${input.orderId}`,
}),
executeChild(sendEmailWorkflow, {
args: [input.customerId, `Order ${input.orderId} confirmed`],
workflowId: `email-${input.orderId}`,
}),
]);
return { paymentResult, emailResult };
}
Worker Setup in Node.js
Workers are the execution runtime. They connect to Temporal Server, poll task queues, and execute code.
// worker.ts - Production setup
import { Worker, NativeConnection } from '@temporalio/worker';
import * as workflows from './workflows';
import * as activities from './activities';
async function runWorker() {
const connection = await NativeConnection.connect({
address: process.env.TEMPORAL_ADDRESS || 'localhost:7233',
});
const worker = await Worker.create({
connection,
namespace: process.env.TEMPORAL_NAMESPACE || 'default',
taskQueue: 'onboarding-queue',
workflowsPath: require.resolve('./workflows'),
activitiesPath: require.resolve('./activities'),
maxActivitiesPerSecond: 100,
maxConcurrentActivityExecutions: 10,
maxConcurrentWorkflowTaskExecutions: 40,
});
console.log('Worker listening on task queue: onboarding-queue');
await worker.run();
}
runWorker().catch((err) => {
console.error('Worker failed:', err);
process.exit(1);
});
Testing Workflows with TestWorkflowEnvironment
Test workflows without a real Temporal server using the test environment. This enables fast, deterministic tests.
// workflow.test.ts
import { TestWorkflowEnvironment } from '@temporalio/testing';
import { Worker } from '@temporalio/worker';
import { onboardingWorkflow } from './workflow';
import * as activities from './activities';
describe('onboardingWorkflow', () => {
let testEnv: TestWorkflowEnvironment;
beforeAll(async () => {
testEnv = await TestWorkflowEnvironment.createLocal();
});
afterAll(async () => {
await testEnv?.teardown();
});
test('completes onboarding successfully', async () => {
const { client, nativeConnection } = testEnv;
const worker = await Worker.create({
connection: nativeConnection,
taskQueue: 'test-queue',
workflows: { onboardingWorkflow },
activities: {
fetchUserData: async () => ({
userId: '123',
firstName: 'John',
paymentId: 'pm_123',
}),
chargeCard: async () => ({ transactionId: 'tx_123', status: 'success' }),
sendWelcomeEmail: async () => {},
},
});
const handle = await client.workflow.start(onboardingWorkflow, {
args: [{ userId: '123', email: 'john@example.com' }],
taskQueue: 'test-queue',
workflowId: 'test-workflow-1',
});
const result = await handle.result();
expect(result).toBeUndefined();
});
});
Temporal vs BullMQ vs SQS Decision Matrix
| Feature | Temporal | BullMQ | SQS |
|---|---|---|---|
| Workflow orchestration | Yes | Basic chains | No |
| Durability | Server state | Redis | AWS managed |
| Scalability | Millions of workflows | Depends on Redis | Unlimited |
| Long-running jobs | Yes (months) | Hours | Hours |
| Deterministic replay | Yes | No | No |
| Price | Self-hosted or Temporal Cloud | Redis cost | Per request |
| Operational complexity | Temporal cluster | Redis cluster | AWS managed |
Use Temporal for complex multi-step workflows spanning days/weeks. Use BullMQ for job queues with moderate complexity. Use SQS for simple message passing.
Checklist
- Configure workflow retry policies and timeouts
- Use activities for all side effects
- Never call
Math.random()orDate.now()in workflow code - Implement query handlers for monitoring
- Handle signals for pause/cancel operations
- Set up worker autoscaling based on task queue depth
- Use child workflows for parallel orchestration
- Write tests using TestWorkflowEnvironment
- Monitor workflow execution with Temporal UI
- Plan for Temporal Server HA and persistence
Conclusion
Temporal.io trades infrastructure complexity for application simplicity. Your code becomes a simple state machine; Temporal handles retries, durability, and recovery. Start with a single worker, test with TestWorkflowEnvironment, and scale by adding workers to task queues. For mission-critical workflows—subscriptions, onboarding, payments—Temporal eliminates entire classes of bugs that plague queue-based systems.