Published on

Read Replica Lag — Why Your Users See Stale Data After Saving

Authors

Introduction

Adding a read replica is one of the first database scaling steps teams take — offload SELECTs from the primary, double your read throughput. But replication is asynchronous. The replica is always slightly behind the primary — usually milliseconds, sometimes seconds, occasionally minutes if the primary is under heavy write load. Your application needs to account for this lag, or users see confusing stale data.

The Problem

T=0.000  User saves profile: UPDATE users SET name = 'Sanjeev' WHERE id = '123'
         Write goes to PRIMARY
T=0.001  User's browser reloads profile
         SELECT name FROM users WHERE id = '123'
         Read goes to REPLICA

T=0.001  Replica replication lag: 1.5 seconds
         Replica still has name = 'Old Name'

T=0.001  Response: { name: 'Old Name' }User sees old data
         User thinks their save didn't work
         User saves again
         3x support ticket

Fix 1: Route Writes and Immediate Reads to Primary

The simplest fix: after a write, read from the primary for the current request:

class DatabaseRouter {
  constructor(
    private primary: Pool,
    private replica: Pool
  ) {}

  // Reads that need fresh data — use primary
  async readAfterWrite<T>(query: string, params?: any[]): Promise<T> {
    return this.primary.query(query, params)
  }

  // Reads that can tolerate stale data — use replica
  async read<T>(query: string, params?: any[]): Promise<T> {
    return this.replica.query(query, params)
  }

  // All writes go to primary
  async write<T>(query: string, params?: any[]): Promise<T> {
    return this.primary.query(query, params)
  }
}

// In your service
async function updateProfile(userId: string, data: UpdateProfileDto) {
  // Write to primary
  await db.write('UPDATE users SET name = $1 WHERE id = $2', [data.name, userId])

  // Read back from primary (read-after-write consistency)
  const updated = await db.readAfterWrite(
    'SELECT * FROM users WHERE id = $1',
    [userId]
  )

  return updated
}

// For listing endpoints where stale data is acceptable:
async function getRecentPosts() {
  return db.read('SELECT * FROM posts ORDER BY created_at DESC LIMIT 20')
}

Fix 2: Sticky Sessions — Route User to Same Source After Write

Track "this user just wrote something" and route them to the primary for a short window:

import { Request, Response, NextFunction } from 'express'

const WRITE_STICKY_DURATION_MS = 5000  // 5 seconds

// Middleware: after a write, mark the user as "recently wrote"
function markRecentWrite(userId: string) {
  // Store in Redis with 5-second TTL
  return redis.set(`recent_write:${userId}`, '1', 'PX', WRITE_STICKY_DURATION_MS)
}

async function shouldUsePrimary(userId: string): Promise<boolean> {
  const recentWrite = await redis.get(`recent_write:${userId}`)
  return recentWrite !== null
}

// In your profile update handler
app.put('/users/profile', async (req, res) => {
  const userId = req.user.id

  await db.primary.query('UPDATE users SET name = $1 WHERE id = $2',
    [req.body.name, userId])

  await markRecentWrite(userId)  // Mark: this user needs primary reads for 5 seconds

  res.json({ success: true })
})

// In your profile read handler
app.get('/users/profile', async (req, res) => {
  const userId = req.user.id

  const pool = (await shouldUsePrimary(userId)) ? db.primary : db.replica

  const user = await pool.query('SELECT * FROM users WHERE id = $1', [userId])
  res.json(user.rows[0])
})

Fix 3: Check Replication Lag Before Reading

async function getReplicaLagSeconds(): Promise<number> {
  // Query the replica itself for its lag
  const result = await replica.query(`
    SELECT EXTRACT(EPOCH FROM (NOW() - pg_last_xact_replay_timestamp())) AS lag_seconds
  `)
  return parseFloat(result.rows[0]?.lag_seconds ?? '0')
}

async function queryWithFallback<T>(
  query: string,
  params: any[],
  maxLagSeconds = 2
): Promise<T> {
  const lag = await getReplicaLagSeconds()

  if (lag > maxLagSeconds) {
    console.warn(`Replica lag ${lag}s > ${maxLagSeconds}s — routing to primary`)
    return primary.query(query, params)
  }

  return replica.query(query, params)
}

Fix 4: Synchronous Replication for Critical Data

For data where even milliseconds of lag are unacceptable, use synchronous replication:

-- PostgreSQL: require replication acknowledgment before commit returns
-- postgresql.conf on primary:
synchronous_standby_names = 'replica-1'
synchronous_commit = on

-- This makes EVERY write wait for the replica to acknowledge
-- Slower (adds replica round-trip to every write)
-- But: replica is always up to date when the write succeeds

Or, more targeted — only synchronize critical transactions:

async function criticalWrite(query: string, params: any[]) {
  const client = await primary.connect()
  try {
    // Override session to require sync replication for this transaction
    await client.query("SET synchronous_commit = 'remote_apply'")
    const result = await client.query(query, params)
    return result
  } finally {
    client.release()
  }
}

Monitoring Replication Lag

-- On the primary — check all replica lag
SELECT
  application_name,
  state,
  sync_state,
  write_lag,
  flush_lag,
  replay_lag
FROM pg_stat_replication;

-- On the replica — check its own lag
SELECT
  NOW() - pg_last_xact_replay_timestamp() AS replication_lag,
  pg_is_in_recovery() AS is_replica
// Alert if replica lag exceeds threshold
cron.schedule('*/1 * * * *', async () => {
  const lag = await getReplicaLagSeconds()

  if (lag > 10) {
    await alerting.critical(`Replica lag: ${lag}s — reads are significantly stale`)
  } else if (lag > 2) {
    await alerting.warn(`Replica lag: ${lag}s`)
  }
})

Read Replica Lag Checklist

  • ✅ Always read-after-write from primary for data just written
  • ✅ Use sticky session (Redis TTL) to route recent writers to primary
  • ✅ Monitor pg_last_xact_replay_timestamp() on replicas
  • ✅ Alert if lag exceeds 2 seconds
  • ✅ Use synchronous replication for financial transactions
  • ✅ Never route writes to replicas (they're read-only)
  • ✅ Design UIs to handle eventual consistency (optimistic updates)

Conclusion

Read replica lag is a fundamental property of async replication — you can manage it but not eliminate it. The most practical fix is routing reads back to the primary for a short window after any write (sticky sessions with a 5-second Redis TTL). For dashboards, listing pages, and search where slight staleness is acceptable, replicas are great. For read-after-write scenarios — profile pages, balance displays, anything the user just modified — use the primary. Monitor lag constantly and have an automatic fallback to primary when lag exceeds your threshold.