Published on

Read Replicas in Production — Offloading Reads, Handling Lag, and Routing Queries

Authors

Introduction

Read replicas are your first scaling tool: they offload reads from the primary without sharding. Yet many teams deploy replicas and see zero improvement because they mishandle replication lag or hit write bottlenecks. This post covers production replica patterns, lag detection, and knowing when replicas aren't the answer.

Replication Lag Detection and Monitoring

Replicas lag behind the primary. Detect this lag to avoid serving stale data:

-- Query on REPLICA to measure lag
SELECT
  now() - pg_last_wal_receive_time() AS replication_lag,
  now() - pg_last_xact_replay_time() AS replay_lag;

-- Expected output:
-- replication_lag: 00:00:00.123 (123ms lag)
-- replay_lag:      00:00:00.456 (456ms lag)

-- On PRIMARY: monitor replica progress
SELECT
  application_name,
  client_addr,
  state,
  write_lag,
  flush_lag,
  replay_lag,
  sync_state
FROM pg_stat_replication;

-- Typical output:
-- write_lag:   00:00:00.012 (data sent but not flushed to disk)
-- flush_lag:   00:00:00.045 (data flushed but not replayed)
-- replay_lag:  00:00:00.234 (lag between primary and replica statement execution)

-- Set up continuous monitoring
CREATE TABLE replica_lag_history (
  measured_at TIMESTAMP DEFAULT NOW(),
  replica_name TEXT,
  lag_ms BIGINT
);

-- Background job (e.g., cron every 10 seconds)
INSERT INTO replica_lag_history (replica_name, lag_ms)
SELECT
  application_name,
  EXTRACT(EPOCH FROM replay_lag) * 1000
FROM pg_stat_replication;

Monitor lag continuously. Alert when any replica exceeds your SLA (typically 1-5 seconds).

Read-After-Write Consistency Problem

Users expect to see their own writes immediately. Replicas break this guarantee:

// PROBLEMATIC: User writes to primary, reads from replica
async function updateUserProfile(userId: string, newName: string) {
  // 1. Write to primary
  await primaryDb.query(
    'UPDATE users SET name = $1 WHERE id = $2',
    [newName, userId]
  );

  // 2. Immediately read from replica (might not have replicated yet)
  const user = await replicaDb.query(
    'SELECT * FROM users WHERE id = $1',
    [userId]
  );

  // user.name might still be old value if replica lagging
  return user.rows[0];
}

// SOLUTION 1: Pin to primary after write
async function updateUserProfile(userId: string, newName: string) {
  await primaryDb.query(
    'UPDATE users SET name = $1 WHERE id = $2',
    [newName, userId]
  );

  // Read from PRIMARY, not replica
  const user = await primaryDb.query(
    'SELECT * FROM users WHERE id = $1',
    [userId]
  );

  return user.rows[0];
}

// SOLUTION 2: Session-based pinning (Postgres supports)
async function updateUserProfile(userId: string, newName: string, session: Session) {
  await primaryDb.query(
    'UPDATE users SET name = $1 WHERE id = $2',
    [newName, userId]
  );

  // Mark session to use primary for next 30 seconds
  session.pinnedToPrimaryUntil = Date.now() + 30 * 1000;

  // Subsequent queries from this session use primary
  const db = isSessionPinned(session) ? primaryDb : replicaDb;
  const user = await db.query(
    'SELECT * FROM users WHERE id = $1',
    [userId]
  );

  return user.rows[0];
}

// SOLUTION 3: Write-through cache
// Always read from cache after write, cache is always fresh
async function updateUserProfile(userId: string, newName: string) {
  // Write to primary
  const result = await primaryDb.query(
    'UPDATE users SET name = $1 WHERE id = $2 RETURNING *',
    [newName, userId]
  );

  // Update cache immediately
  const user = result.rows[0];
  await cache.set(`user:${userId}`, user, 3600);

  return user;
}

Choose pinning or write-through cache. Don't read from replicas immediately after writes.

Replica Routing Middleware

Route reads to replicas, writes to primary, transparently:

import { Pool } from 'pg';

class ReplicaRouter {
  primaryPool: Pool;
  replicaPools: Pool[];
  currentReplicaIndex: number = 0;
  sessionPins: Map<string, number> = new Map();

  constructor(primaryUrl: string, replicaUrls: string[]) {
    this.primaryPool = new Pool({ connectionString: primaryUrl });
    this.replicaPools = replicaUrls.map(
      url => new Pool({ connectionString: url })
    );
  }

  private selectReplica(): Pool {
    // Round-robin across replicas
    const pool = this.replicaPools[this.currentReplicaIndex];
    this.currentReplicaIndex = (this.currentReplicaIndex + 1) % this.replicaPools.length;
    return pool;
  }

  async query(
    sql: string,
    params: any[] = [],
    options: { sessionId?: string; forceWrite?: boolean } = {}
  ) {
    const isWrite = /^\s*(INSERT|UPDATE|DELETE|CREATE|ALTER|DROP)/i.test(sql);
    const isSessionPinned = options.sessionId &&
      this.sessionPins.get(options.sessionId) > Date.now();

    if (isWrite || options.forceWrite || isSessionPinned) {
      // Route writes and pinned reads to primary
      const client = await this.primaryPool.connect();
      try {
        // Pin session to primary for 30 seconds after write
        if (isWrite && options.sessionId) {
          this.sessionPins.set(options.sessionId, Date.now() + 30 * 1000);
        }
        return await client.query(sql, params);
      } finally {
        client.release();
      }
    } else {
      // Route read-only queries to replica
      const client = await this.selectReplica().connect();
      try {
        return await client.query(sql, params);
      } catch (err) {
        // Fallback to primary if replica down
        console.warn('Replica failed, falling back to primary', err);
        const primaryClient = await this.primaryPool.connect();
        try {
          return await primaryClient.query(sql, params);
        } finally {
          primaryClient.release();
        }
      }
    }
  }
}

// Usage
const router = new ReplicaRouter(
  'postgresql://primary.db',
  [
    'postgresql://replica-1.db',
    'postgresql://replica-2.db',
    'postgresql://replica-3.db'
  ]
);

// Transparent routing
const sessionId = 'user-session-123';

// Write route to primary, pin session
await router.query(
  'UPDATE users SET name = $1 WHERE id = $2',
  ['Alice', 1],
  { sessionId, forceWrite: true }
);

// Read from replica (or primary if session pinned)
const result = await router.query(
  'SELECT * FROM users WHERE id = $1',
  [1],
  { sessionId }
);

Middleware makes replica routing transparent to application code.

Replica Monitoring and Health Checks

Monitor replica health continuously:

#!/bin/bash

# Monitor replication lag across all replicas
check_replica_lag() {
  local replicas=("replica-1.db" "replica-2.db" "replica-3.db")
  local lag_threshold_ms=5000

  for replica in "${replicas[@]}"; do
    lag_ms=$(psql -h "$replica" -c "
      SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_time())) * 1000 as lag_ms;
    " -t)

    if (( $(echo "$lag_ms > $lag_threshold_ms" | bc -l) )); then
      echo "ALERT: $replica replication lag ${lag_ms}ms exceeds threshold"
      # Notify ops team
    fi
  done
}

# Monitor replica connection count
check_replica_connections() {
  local replicas=("replica-1.db" "replica-2.db" "replica-3.db")
  local max_connections=500

  for replica in "${replicas[@]}"; do
    connections=$(psql -h "$replica" -c "
      SELECT count(*) FROM pg_stat_activity;
    " -t)

    if (( connections > max_connections )); then
      echo "ALERT: $replica connection count $connections exceeds $max_connections"
    fi
  done
}

# Readiness check for load balancer
readiness_probe() {
  psql -h localhost -c "SELECT 1" > /dev/null 2>&1
  exit $?
}

# Liveness check (replica can accept connections)
liveness_probe() {
  timeout 5 psql -h localhost -c "SELECT 1" > /dev/null 2>&1
  exit $?
}

# Add to Kubernetes
# livenessProbe:
#   exec:
#     command:
#       - /bin/sh
#       - -c
#       - ./liveness_probe.sh
#   initialDelaySeconds: 30
#   periodSeconds: 10

When Replicas Don't Help (Write-Heavy Workloads)

Read replicas scale reads, not writes. Recognize when you need something else:

-- Analyze read vs write ratio
SELECT
  schemaname, tablename,
  seq_scan + idx_scan AS total_reads,
  n_tup_ins + n_tup_upd + n_tup_del AS total_writes,
  ROUND(100.0 * (n_tup_ins + n_tup_upd + n_tup_del) /
    NULLIF(seq_scan + idx_scan, 0), 1) AS write_to_read_ratio
FROM pg_stat_user_tables
WHERE (seq_scan + idx_scan) > 0
ORDER BY (n_tup_ins + n_tup_upd + n_tup_del) DESC;

-- If write_to_read_ratio > 50, replicas won't help much
-- Example output:
-- orders: 1M reads, 500K writes = 50% write ratio
-- → Replicas handle reads, but primary still saturated by writes

-- Solutions for write-heavy workloads:
-- 1. Write de-duplication: batch writes, reduce frequency
-- 2. Partitioning: split data across multiple tables by shard key
-- 3. Citus: distributed database for horizontal write scaling
-- 4. Message queue: defer non-critical writes (analytics, logging)

-- Example: batch analytics writes to avoid saturation
CREATE TABLE events_batch AS
WITH batched AS (
  SELECT
    user_id,
    event_type,
    COUNT(*) AS count,
    MAX(created_at) AS latest_time
  FROM events_temp
  WHERE created_at > NOW() - INTERVAL '1 minute'
  GROUP BY user_id, event_type
)
INSERT INTO events_summary (user_id, event_type, count, updated_at)
SELECT * FROM batched
ON CONFLICT (user_id, event_type)
DO UPDATE SET count = count + EXCLUDED.count, updated_at = NOW();

ANALYZE on Replicas

Keep statistics current on replicas:

-- Run ANALYZE on replica (read-only, doesn't affect primary)
ANALYZE;

-- ANALYZE specific tables
ANALYZE users;
ANALYZE orders;

-- Verify statistics updated
SELECT
  schemaname, tablename,
  last_vacuum, last_autovacuum, last_analyze, last_autoanalyze
FROM pg_stat_user_tables
ORDER BY last_analyze DESC;

-- Note: VACUUM (not ANALYZE) must run on primary
-- Replicas auto-replay vacuums from primary

ANALYZE on replicas improves query plans without affecting the primary.

Read Replicas Checklist

  • Replicas in separate availability zones/regions
  • Replication lag monitored (alert if > 5 seconds)
  • Session pinning implemented after writes
  • Replica routing middleware deployed
  • Fallback to primary if replica fails
  • Read/write ratio analyzed (replicas help if >70% reads)
  • ANALYZE scheduled on replicas
  • Replica connection limits monitored
  • Write-heavy queries identified (sharding alternative)
  • Health checks in load balancer/application

Conclusion

Read replicas are your first scaling lever. They excel when your bottleneck is read throughput, not write throughput. Handle replication lag carefully through session pinning or write-through caching to maintain read-after-write consistency. Monitor lag continuously and implement middleware to route reads/writes transparently. If your workload is write-heavy (>50% writes), replicas won't solve your problem—you'll need partitioning, Citus, or message queues instead.