Published on

Read/Write Splitting in Production — Scaling Reads Without Sharding

Authors

Introduction

Read/write splitting distributes database load by routing writes to a primary and reads to replicas. This simple pattern scales read-heavy workloads 10x without complex sharding. However, replication lag creates consistency challenges: users see stale data if they read from a lagging replica immediately after writing. This guide covers production read/write splitting: routing strategies, lag detection, consistency patterns, and when NOT to use read replicas.

Primary vs Replica Routing Logic

Reading from replicas distributes load. Writing to primary ensures consistency.

// Node.js: Custom routing logic with pg pool
import pg from 'pg';

// Primary pool (for writes, strongly consistent)
const primaryPool = new pg.Pool({
  host: 'primary.db.example.com',
  port: 5432,
  database: 'myapp',
  max: 10,
  idleTimeoutMillis: 30000,
});

// Replica pools (for reads, potential staleness)
const replicaPools = [
  new pg.Pool({
    host: 'replica-1.db.example.com',
    port: 5432,
    database: 'myapp',
    max: 10,
  }),
  new pg.Pool({
    host: 'replica-2.db.example.com',
    port: 5432,
    database: 'myapp',
    max: 10,
  }),
];

let replicaIndex = 0;

// Load balance: round-robin replica selection
function getReplicaPool() {
  const pool = replicaPools[replicaIndex % replicaPools.length];
  replicaIndex++;
  return pool;
}

// Execute on primary (for writes)
async function executeWrite(query: string, params: any[]) {
  const client = await primaryPool.connect();
  try {
    return await client.query(query, params);
  } finally {
    client.release();
  }
}

// Execute on replica (for reads)
async function executeRead(query: string, params: any[]) {
  const replicaPool = getReplicaPool();
  const client = await replicaPool.connect();
  try {
    return await client.query(query, params);
  } catch (err) {
    // Replica failure: fallback to primary (slower but correct)
    console.warn(`Replica failed: ${err.message}, falling back to primary`);
    return executeWrite(query, params);
  } finally {
    client.release();
  }
}

// High-level API
async function getUser(userId: string) {
  const result = await executeRead(
    'SELECT * FROM users WHERE id = $1',
    [userId]
  );
  return result.rows[0];
}

async function updateUser(userId: string, data: Record<string, any>) {
  return executeWrite(
    'UPDATE users SET name = $1, email = $2 WHERE id = $3',
    [data.name, data.email, userId]
  );
}

Replication Lag Monitoring and Lag-Aware Routing

Replication lag (delay between primary commit and replica apply) causes staleness. Monitor and route around it.

-- Check replication lag (on replica)
SELECT
  slot_name,
  restart_lsn,
  confirmed_flush_lsn,
  write_lag,
  flush_lag,
  replay_lag,
  NOW() - pg_last_xact_replay_timestamp() AS replication_lag_seconds
FROM pg_replication_slots;

-- Output:
-- slot_name | write_lag | flush_lag | replay_lag | replication_lag_seconds
-- slot1     | 1ms       | 5ms       | 50ms       | 0.05 seconds

-- Monitor on primary (WAL sender stats)
SELECT
  pid,
  usename,
  application_name,
  client_addr,
  write_lag,
  flush_lag,
  replay_lag,
  state
FROM pg_stat_replication;

-- Typical lag ranges:
-- < 10ms: Excellent (nearly synchronous)
-- 10-100ms: Good (acceptable for most read scenarios)
-- 100ms-1s: Poor (avoid critical read-after-write)
-- > 1s: Degraded (check replica health)
// Node.js: Monitor lag and route intelligently
class ReplicaLagMonitor {
  private replicaLag = new Map<string, number>();  // replica URL → lag ms
  private updateInterval = 10000;  // Check every 10 seconds

  constructor(replicas: string[]) {
    replicas.forEach(replica => this.replicaLag.set(replica, 0));
    this.startMonitoring();
  }

  private async startMonitoring() {
    setInterval(async () => {
      for (const [replica, pool] of Object.entries(replicaPools)) {
        try {
          const result = await pool.query(
            `SELECT EXTRACT(EPOCH FROM (NOW() - pg_last_xact_replay_timestamp())) AS lag_seconds`
          );

          const lagMs = (result.rows[0].lag_seconds || 0) * 1000;
          this.replicaLag.set(replica, lagMs);
        } catch (err) {
          // Replica query failed, mark as unhealthy
          this.replicaLag.set(replica, Infinity);
        }
      }

      // Log lag summary
      const maxLag = Math.max(...this.replicaLag.values());
      console.log(`Replication lag: ${maxLag.toFixed(0)}ms`);
    }, this.updateInterval);
  }

  // Get replica with lowest lag
  getHealthiestReplica() {
    let bestReplica = null;
    let bestLag = Infinity;

    for (const [replica, lag] of this.replicaLag.entries()) {
      if (lag < bestLag && lag < 1000) {  // Avoid replicas with > 1s lag
        bestReplica = replica;
        bestLag = lag;
      }
    }

    return bestReplica || null;  // null = no healthy replica
  }

  getLagForReplica(replica: string): number {
    return this.replicaLag.get(replica) || Infinity;
  }
}

// Lag-aware read routing
async function executeReadWithLagAwareness(
  query: string,
  params: any[],
  maxAcceptableLag = 100  // 100ms
) {
  const replica = lagMonitor.getHealthiestReplica();

  if (!replica || lagMonitor.getLagForReplica(replica) > maxAcceptableLag) {
    // Replica lag too high, read from primary
    console.log(`Replica lag too high, using primary`);
    return executeWrite(query, params);
  }

  return executeRead(query, params);
}

Sticky Sessions for Read-After-Write Consistency

After writing to primary, immediately reading from replica may return stale data. Sticky sessions avoid this.

// Express middleware: Sticky sessions for read-after-write
import express from 'express';
import { v4 as uuidv4 } from 'uuid';

const app = express();

interface UserSession {
  sessionId: string;
  userId: string;
  lastWriteTime: number;
  primaryUntil: number;
}

const sessions = new Map<string, UserSession>();

// Middleware: Track writes and set session affinity
app.use(async (req, res, next) => {
  let sessionId = req.cookies?.sessionId;

  if (!sessionId) {
    sessionId = uuidv4();
    res.cookie('sessionId', sessionId, { maxAge: 3600000 });
  }

  const session = sessions.get(sessionId) || {
    sessionId,
    userId: null,
    lastWriteTime: 0,
    primaryUntil: 0,
  };

  // Attach routing decision to request
  req.useReplica = session.primaryUntil < Date.now();
  req.sessionId = sessionId;

  next();
});

// POST /users/:id/profile (write)
app.post('/users/:id/profile', async (req, res) => {
  const userId = req.params.id;

  // Update primary
  await executeWrite(
    'UPDATE users SET name = $1 WHERE id = $2',
    [req.body.name, userId]
  );

  // Force primary reads for next 100ms (replication lag safety margin)
  const session = sessions.get(req.sessionId) || {};
  sessions.set(req.sessionId, {
    ...session,
    userId,
    lastWriteTime: Date.now(),
    primaryUntil: Date.now() + 100,  // 100ms safety window
  });

  res.json({ success: true });
});

// GET /users/:id (read)
app.get('/users/:id', async (req, res) => {
  const userId = req.params.id;

  // Use primary if within sticky window, otherwise use replica
  const result = req.useReplica
    ? await executeRead('SELECT * FROM users WHERE id = $1', [userId])
    : await executeWrite('SELECT * FROM users WHERE id = $1', [userId]);

  res.json(result.rows[0]);
});

// Session cleanup
setInterval(() => {
  const now = Date.now();
  for (const [sessionId, session] of sessions.entries()) {
    if (session.primaryUntil < now - 600000) {  // Remove old sessions
      sessions.delete(sessionId);
    }
  }
}, 300000);  // Clean up every 5 minutes

Connection Pool Per Endpoint

Separate pools for primary and replicas isolate connection limits.

// Production connection pooling strategy
import pg from 'pg';
import PgBouncer from 'pgbouncer';  // optional connection pooler

// Option 1: Direct pool per endpoint (simpler for small deployments)
const pools = {
  primary: new pg.Pool({
    host: 'primary.example.com',
    max: 20,
    idleTimeoutMillis: 30000,
    connectionTimeoutMillis: 5000,
  }),
  replica1: new pg.Pool({
    host: 'replica-1.example.com',
    max: 50,  // More connections (lower write load)
    idleTimeoutMillis: 30000,
  }),
  replica2: new pg.Pool({
    host: 'replica-2.example.com',
    max: 50,
    idleTimeoutMillis: 30000,
  }),
};

// Option 2: PgBouncer pooler (better for large deployments)
// config: pgbouncer.ini
/*
[databases]
myapp_primary = host=primary.example.com port=5432
myapp_replica1 = host=replica-1.example.com port=5432
myapp_replica2 = host=replica-2.example.com port=5432

[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
reserve_pool_size = 5
*/

// Connect to PgBouncer instead of database directly
const primaryPool = new pg.Pool({
  host: 'localhost',
  port: 6432,  // PgBouncer port
  database: 'myapp_primary',
  max: 20,
});

const replicaPools = [
  new pg.Pool({
    host: 'localhost',
    port: 6432,
    database: 'myapp_replica1',
    max: 50,
  }),
];

// Monitor pool stats
async function getPoolStats() {
  const stats = {
    primary: {
      totalCount: pools.primary.totalCount,
      idleCount: pools.primary.idleCount,
      waitingCount: pools.primary.waitingCount,
    },
    replicas: replicaPools.map(pool => ({
      totalCount: pool.totalCount,
      idleCount: pool.idleCount,
      waitingCount: pool.waitingCount,
    })),
  };

  console.log('Pool stats:', stats);

  // Alert if approaching limits
  if (pools.primary.idleCount < 5) {
    console.warn('Primary pool low on idle connections');
  }
}

setInterval(getPoolStats, 60000);

Prisma/Drizzle Read Replica Configuration

ORM configuration for read/write splitting.

// Prisma: Read replica configuration
import { PrismaClient } from '@prisma/client';

// Datasource in prisma.schema
/*
datasource db {
  provider = "postgresql"
  url      = env("DATABASE_URL_PRIMARY")
  replicaUrl = [
    env("DATABASE_URL_REPLICA_1"),
    env("DATABASE_URL_REPLICA_2"),
  ]
}
*/

const prisma = new PrismaClient();

// Explicit read replica usage
const user = await prisma.$queryRaw`
  SELECT * FROM users WHERE id = ${userId}
`;  // Uses replica by default

// Explicit primary read (strong consistency)
const user = await prisma.user.findUnique({
  where: { id: userId },
  // Cannot force primary with Prisma, requires raw query
});

// Drizzle: Manual read replica routing
import { drizzle } from 'drizzle-orm/postgres-js';
import postgres from 'postgres';

const primaryDb = postgres(process.env.DATABASE_URL_PRIMARY);
const replicaDb = postgres(process.env.DATABASE_URL_REPLICA_1);

const primaryDrizzle = drizzle(primaryDb);
const replicaDrizzle = drizzle(replicaDb);

// Insert (primary)
async function createUser(data: any) {
  return primaryDrizzle
    .insert(users)
    .values(data)
    .returning();
}

// Read (replica)
async function getUser(userId: string) {
  return replicaDrizzle
    .select()
    .from(users)
    .where(eq(users.id, userId));
}

// Custom routing layer above Drizzle
class DatabaseRouter {
  constructor(
    private primaryDb: ReturnType<typeof drizzle>,
    private replicaDbs: ReturnType<typeof drizzle>[],
    private lagMonitor: ReplicaLagMonitor
  ) {}

  async read<T>(query: (db: typeof this.primaryDb) => Promise<T>) {
    // Route to replica with acceptable lag
    const replica = this.lagMonitor.getHealthiestReplica();
    if (replica && this.lagMonitor.getLagForReplica(replica) < 100) {
      try {
        return await query(this.replicaDbs[0]);  // simplified
      } catch (err) {
        console.warn('Replica read failed, falling back to primary');
      }
    }

    return query(this.primaryDb);
  }

  async write<T>(query: (db: typeof this.primaryDb) => Promise<T>) {
    return query(this.primaryDb);
  }
}

Detecting Replication Lag in Application Code

Monitor lag and alert on degradation.

// Application-level lag detection and alerting
import { EventEmitter } from 'events';

class ReplicationLagDetector extends EventEmitter {
  private lagHistory: number[] = [];
  private readonly maxHistorySize = 60;  // Keep 60 samples
  private readonly checkInterval = 10000;  // Check every 10 seconds
  private readonly lagThresholds = {
    warning: 100,   // 100ms
    critical: 1000, // 1 second
  };

  constructor(private replicaPool: pg.Pool) {
    super();
    this.startMonitoring();
  }

  private async startMonitoring() {
    setInterval(async () => {
      try {
        const result = await this.replicaPool.query(
          `SELECT EXTRACT(EPOCH FROM (NOW() - pg_last_xact_replay_timestamp())) * 1000 AS lag_ms`
        );

        const lagMs = parseFloat(result.rows[0].lag_ms) || 0;
        this.lagHistory.push(lagMs);

        if (this.lagHistory.length > this.maxHistorySize) {
          this.lagHistory.shift();
        }

        // Emit events based on lag level
        if (lagMs > this.lagThresholds.critical) {
          this.emit('critical', { lag: lagMs, timestamp: new Date() });
        } else if (lagMs > this.lagThresholds.warning) {
          this.emit('warning', { lag: lagMs, timestamp: new Date() });
        }
      } catch (err) {
        this.emit('error', err);
      }
    }, this.checkInterval);
  }

  getAverageLag(): number {
    if (this.lagHistory.length === 0) return 0;
    return this.lagHistory.reduce((a, b) => a + b, 0) / this.lagHistory.length;
  }

  getMaxLag(): number {
    return Math.max(...this.lagHistory, 0);
  }

  shouldUseReplica(maxAcceptableLag = 100): boolean {
    return this.getMaxLag() < maxAcceptableLag;
  }
}

// Setup monitoring
const lagDetector = new ReplicationLagDetector(replicaPool);

lagDetector.on('warning', ({ lag }) => {
  console.warn(`Replication lag warning: ${lag.toFixed(0)}ms`);
});

lagDetector.on('critical', ({ lag }) => {
  console.error(`Replication lag CRITICAL: ${lag.toFixed(0)}ms`);
  // Alert ops team
  slackAlert(`DB replication lag critical: ${lag.toFixed(0)}ms`);
  // Consider fallback to primary only
});

Circuit Breaker for Replica Failures

Automatically stop using unhealthy replicas.

// Circuit breaker pattern for replica routing
enum CircuitState {
  CLOSED = 'CLOSED',    // Normal operation
  OPEN = 'OPEN',        // Failing, stop trying
  HALF_OPEN = 'HALF_OPEN',  // Test if recovered
}

class ReplicaCircuitBreaker {
  private state = CircuitState.CLOSED;
  private failureCount = 0;
  private lastFailureTime = 0;
  private readonly failureThreshold = 5;
  private readonly resetTimeout = 30000;  // 30 seconds

  async execute<T>(
    fn: () => Promise<T>,
    fallback: () => Promise<T>
  ): Promise<T> {
    switch (this.state) {
      case CircuitState.CLOSED:
        try {
          const result = await fn();
          this.failureCount = 0;
          return result;
        } catch (err) {
          this.failureCount++;
          this.lastFailureTime = Date.now();

          if (this.failureCount >= this.failureThreshold) {
            this.state = CircuitState.OPEN;
            console.warn('Circuit breaker OPEN: replica failing');
          }

          return fallback();
        }

      case CircuitState.OPEN:
        // Check if reset timeout elapsed
        if (Date.now() - this.lastFailureTime > this.resetTimeout) {
          this.state = CircuitState.HALF_OPEN;
          console.log('Circuit breaker HALF_OPEN: testing replica recovery');
        }

        return fallback();

      case CircuitState.HALF_OPEN:
        try {
          const result = await fn();
          this.state = CircuitState.CLOSED;
          this.failureCount = 0;
          console.log('Circuit breaker CLOSED: replica recovered');
          return result;
        } catch (err) {
          this.state = CircuitState.OPEN;
          this.lastFailureTime = Date.now();
          return fallback();
        }
    }
  }

  getState(): CircuitState {
    return this.state;
  }
}

// Usage
const replicaBreaker = new ReplicaCircuitBreaker();

async function readUser(userId: string) {
  return replicaBreaker.execute(
    () => executeRead('SELECT * FROM users WHERE id = $1', [userId]),
    () => executeWrite('SELECT * FROM users WHERE id = $1', [userId])  // fallback to primary
  );
}

When NOT to Use Read Replicas

Understand the limitations.

DON'T use read replicas for:
- Strong consistency requirements (financial transactions, inventory)
- Read-after-write patterns without sticky sessions
- High-cardinality updates (frequent writes to same rows)
- Transactions spanning multiple rows (complex joins across read/write)

DO use read replicas for:
- Analytical queries (read-heavy, stale data acceptable)
- Caching (Redis on top of replicas)
- Reporting (typically hours stale, fine)
- Public APIs (reads >> writes)

Replication lag growth factors:
- High write volume (primary writes faster than replica applies)
- Complex queries on replica (CPU-bound)
- Network latency to replica
- Under-provisioned replica (insufficient CPU/disk)

Typical breaking points:
- Lag > 1 second: Consider vertical scaling (bigger replica)
- Lag > 5 seconds: Add second replica or optimize queries
- Lag > 30 seconds: Implement Patroni/etcd for HA, consider sharding

Checklist

  • Primary and replica pools created with separate connection limits
  • Replication lag monitored continuously (< 100ms target)
  • Lag-aware read routing implemented (fallback to primary)
  • Sticky sessions configured for read-after-write consistency
  • Circuit breaker implemented for unhealthy replicas
  • Prisma/Drizzle configured with read replica URLs
  • Replica lag detection integrated with alerting
  • PgBouncer deployed for high-throughput scenarios
  • Connection pool stats monitored
  • Failover tested (replica down scenario)
  • Lag spike handling tested (temporary degradation)
  • Query patterns reviewed for replica suitability

Conclusion

Read/write splitting scales read-heavy workloads without sharding complexity. Route writes to primary and reads to replicas, monitor lag continuously, and use sticky sessions to handle read-after-write consistency. Implement circuit breakers to automatically degrade to primary when replicas fail, and choose appropriate timeouts based on your application's consistency requirements. For most web applications, read replicas with lag monitoring deliver a 5-10x read capacity increase with minimal architectural complexity.