Read Replica Lag — Why Your Users See Stale Data After Saving

Introduction

Adding a read replica is one of the first database scaling steps teams take — offload SELECTs from the primary, double your read throughput. But replication is asynchronous. The replica is always slightly behind the primary — usually milliseconds, sometimes seconds, occasionally minutes if the primary is under heavy write load. Your application needs to account for this lag, or users see confusing stale data.

The Problem
Fix 1: Route Writes and Immediate Reads to Primary
Fix 2: Sticky Sessions — Route User to Same Source After Write
Fix 3: Check Replication Lag Before Reading
Fix 4: Synchronous Replication for Critical Data
Monitoring Replication Lag
Read Replica Lag Checklist
Conclusion

The Problem

T=0.000  User saves profile: UPDATE users SET name = 'Sanjeev' WHERE id = '123'
         Write goes to PRIMARY ✅

T=0.001  User's browser reloads profile
         SELECT name FROM users WHERE id = '123'
         Read goes to REPLICA

T=0.001  Replica replication lag: 1.5 seconds
         Replica still has name = 'Old Name'

T=0.001  Response: { name: 'Old Name' }  ❌ User sees old data
         User thinks their save didn't work
         User saves again
         3x support ticket

Fix 1: Route Writes and Immediate Reads to Primary

The simplest fix: after a write, read from the primary for the current request:

class DatabaseRouter {
  constructor(
    private primary: Pool,
    private replica: Pool
  ) {}

  // Reads that need fresh data — use primary
  async readAfterWrite<T>(query: string, params?: any[]): Promise<T> {
    return this.primary.query(query, params)
  }

  // Reads that can tolerate stale data — use replica
  async read<T>(query: string, params?: any[]): Promise<T> {
    return this.replica.query(query, params)
  }

  // All writes go to primary
  async write<T>(query: string, params?: any[]): Promise<T> {
    return this.primary.query(query, params)
  }
}

// In your service
async function updateProfile(userId: string, data: UpdateProfileDto) {
  // Write to primary
  await db.write('UPDATE users SET name = $1 WHERE id = $2', [data.name, userId])

  // Read back from primary (read-after-write consistency)
  const updated = await db.readAfterWrite(
    'SELECT * FROM users WHERE id = $1',
    [userId]
  )

  return updated
}

// For listing endpoints where stale data is acceptable:
async function getRecentPosts() {
  return db.read('SELECT * FROM posts ORDER BY created_at DESC LIMIT 20')
}

Fix 2: Sticky Sessions — Route User to Same Source After Write

Track "this user just wrote something" and route them to the primary for a short window:

import { Request, Response, NextFunction } from 'express'

const WRITE_STICKY_DURATION_MS = 5000  // 5 seconds

// Middleware: after a write, mark the user as "recently wrote"
function markRecentWrite(userId: string) {
  // Store in Redis with 5-second TTL
  return redis.set(`recent_write:${userId}`, '1', 'PX', WRITE_STICKY_DURATION_MS)
}

async function shouldUsePrimary(userId: string): Promise<boolean> {
  const recentWrite = await redis.get(`recent_write:${userId}`)
  return recentWrite !== null
}

// In your profile update handler
app.put('/users/profile', async (req, res) => {
  const userId = req.user.id

  await db.primary.query('UPDATE users SET name = $1 WHERE id = $2',
    [req.body.name, userId])

  await markRecentWrite(userId)  // Mark: this user needs primary reads for 5 seconds

  res.json({ success: true })
})

// In your profile read handler
app.get('/users/profile', async (req, res) => {
  const userId = req.user.id

  const pool = (await shouldUsePrimary(userId)) ? db.primary : db.replica

  const user = await pool.query('SELECT * FROM users WHERE id = $1', [userId])
  res.json(user.rows[0])
})

Fix 3: Check Replication Lag Before Reading

async function getReplicaLagSeconds(): Promise<number> {
  // Query the replica itself for its lag
  const result = await replica.query(`
    SELECT EXTRACT(EPOCH FROM (NOW() - pg_last_xact_replay_timestamp())) AS lag_seconds
  `)
  return parseFloat(result.rows[0]?.lag_seconds ?? '0')
}

async function queryWithFallback<T>(
  query: string,
  params: any[],
  maxLagSeconds = 2
): Promise<T> {
  const lag = await getReplicaLagSeconds()

  if (lag > maxLagSeconds) {
    console.warn(`Replica lag ${lag}s > ${maxLagSeconds}s — routing to primary`)
    return primary.query(query, params)
  }

  return replica.query(query, params)
}

Fix 4: Synchronous Replication for Critical Data

For data where even milliseconds of lag are unacceptable, use synchronous replication:

-- PostgreSQL: require replication acknowledgment before commit returns
-- postgresql.conf on primary:
synchronous_standby_names = 'replica-1'
synchronous_commit = on

-- This makes EVERY write wait for the replica to acknowledge
-- Slower (adds replica round-trip to every write)
-- But: replica is always up to date when the write succeeds

Or, more targeted — only synchronize critical transactions:

async function criticalWrite(query: string, params: any[]) {
  const client = await primary.connect()
  try {
    // Override session to require sync replication for this transaction
    await client.query("SET synchronous_commit = 'remote_apply'")
    const result = await client.query(query, params)
    return result
  } finally {
    client.release()
  }
}

Monitoring Replication Lag

-- On the primary — check all replica lag
SELECT
  application_name,
  state,
  sync_state,
  write_lag,
  flush_lag,
  replay_lag
FROM pg_stat_replication;

-- On the replica — check its own lag
SELECT
  NOW() - pg_last_xact_replay_timestamp() AS replication_lag,
  pg_is_in_recovery() AS is_replica

// Alert if replica lag exceeds threshold
cron.schedule('*/1 * * * *', async () => {
  const lag = await getReplicaLagSeconds()

  if (lag > 10) {
    await alerting.critical(`Replica lag: ${lag}s — reads are significantly stale`)
  } else if (lag > 2) {
    await alerting.warn(`Replica lag: ${lag}s`)
  }
})

Read Replica Lag Checklist

✅ Always read-after-write from primary for data just written
✅ Use sticky session (Redis TTL) to route recent writers to primary
✅ Monitor pg_last_xact_replay_timestamp() on replicas
✅ Alert if lag exceeds 2 seconds
✅ Use synchronous replication for financial transactions
✅ Never route writes to replicas (they're read-only)
✅ Design UIs to handle eventual consistency (optimistic updates)

Conclusion

Read replica lag is a fundamental property of async replication — you can manage it but not eliminate it. The most practical fix is routing reads back to the primary for a short window after any write (sticky sessions with a 5-second Redis TTL). For dashboards, listing pages, and search where slight staleness is acceptable, replicas are great. For read-after-write scenarios — profile pages, balance displays, anything the user just modified — use the primary. Monitor lag constantly and have an automatic fallback to primary when lag exceeds your threshold.