Database Backup and Disaster Recovery 2026: Never Lose Data Again

Sanjeev SharmaSanjeev Sharma
5 min read

Advertisement

Database Disaster Recovery 2026: Plan for the Worst

The question is not if your database will have a problem โ€” it is when. The difference between a minor incident and a business-ending disaster is preparation.

RTO and RPO: Define Your Targets

RTO (Recovery Time Objective): How long can you be down?
RPO (Recovery Point Objective): How much data can you lose?

Tier 1 (e-commerce, banking):  RTO < 1h,  RPO < 5min
Tier 2 (SaaS apps):           RTO < 4h,  RPO < 1h
Tier 3 (blogs, internal tools): RTO < 24h, RPO < 24h

Automated PostgreSQL Backups

#!/bin/bash
# backup-postgres.sh

set -euo pipefail

DB_URL="${DATABASE_URL}"
S3_BUCKET="webcoderspeed-backups"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="postgres_${TIMESTAMP}.sql.gz"

echo "Starting PostgreSQL backup..."

# Create compressed backup
pg_dump "$DB_URL" | gzip > "/tmp/$BACKUP_FILE"

echo "Backup created: /tmp/$BACKUP_FILE ($(du -sh /tmp/$BACKUP_FILE | cut -f1))"

# Upload to S3
aws s3 cp "/tmp/$BACKUP_FILE" "s3://$S3_BUCKET/postgres/$BACKUP_FILE" \
  --storage-class STANDARD_IA  # Cheaper for backups

# Verify upload
aws s3 ls "s3://$S3_BUCKET/postgres/$BACKUP_FILE"

# Clean up local file
rm "/tmp/$BACKUP_FILE"

# Delete backups older than 30 days
aws s3 ls "s3://$S3_BUCKET/postgres/" | while read -r line; do
  createDate=$(echo "$line" | awk '{print $1}')
  if [[ $(date -d "$createDate" +%s) -lt $(date -d "30 days ago" +%s) ]]; then
    fileName=$(echo "$line" | awk '{print $4}')
    aws s3 rm "s3://$S3_BUCKET/postgres/$fileName"
  fi
done

echo "Backup completed!"
# GitHub Actions: Automated daily backup
name: Database Backup

on:
  schedule:
    - cron: '0 0 * * *'  # Daily at midnight UTC
  workflow_dispatch:       # Manual trigger

jobs:
  backup:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Run backup
        env:
          DATABASE_URL: ${{ secrets.DATABASE_URL }}
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        run: bash scripts/backup-postgres.sh

      - name: Notify on failure
        if: failure()
        uses: slackapi/slack-github-action@v1
        with:
          channel-id: '#alerts'
          slack-message: '๐Ÿšจ Database backup FAILED! Check GitHub Actions.'
        env:
          SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

Point-in-Time Recovery (PITR)

# Enable continuous archiving (WAL archiving)
# postgresql.conf
archive_mode = on
archive_command = 'aws s3 cp %p s3://backups/wal/%f'
archive_timeout = 60  # Force archive every 60 seconds

# Recover to specific point in time
# 1. Restore base backup
aws s3 sync s3://backups/base/latest/ /var/lib/postgresql/data/

# 2. Create recovery config
cat > /var/lib/postgresql/data/postgresql.auto.conf <<EOF
restore_command = 'aws s3 cp s3://backups/wal/%f %p'
recovery_target_time = '2026-03-26 14:30:00'
recovery_target_action = 'promote'
EOF

# 3. Start PostgreSQL โ€” it will replay WAL to the target time
pg_ctl start

Read Replicas and Failover

// Use primary for writes, replica for reads
import { Pool } from 'pg'

const primaryPool = new Pool({
  host: process.env.DB_PRIMARY_HOST,  // Read-write
})

const replicaPool = new Pool({
  host: process.env.DB_REPLICA_HOST,  // Read-only
})

class DatabaseService {
  async query(sql: string, params: any[]) {
    // Default: replica for reads
    return replicaPool.query(sql, params)
  }

  async write(sql: string, params: any[]) {
    // Writes always go to primary
    return primaryPool.query(sql, params)
  }

  async transaction<T>(fn: (client: any) => Promise<T>): Promise<T> {
    const client = await primaryPool.connect()
    try {
      await client.query('BEGIN')
      const result = await fn(client)
      await client.query('COMMIT')
      return result
    } catch (error) {
      await client.query('ROLLBACK')
      throw error
    } finally {
      client.release()
    }
  }
}

Backup Verification

#!/bin/bash
# verify-backup.sh โ€” Test restore every week

BACKUP_URL="s3://webcoderspeed-backups/postgres/latest.sql.gz"
TEST_DB="backup_test_$(date +%Y%m%d)"

echo "Testing backup restore..."

# Download and restore to temp DB
aws s3 cp "$BACKUP_URL" /tmp/backup.sql.gz
createdb "$TEST_DB"
gunzip -c /tmp/backup.sql.gz | psql "$TEST_DB"

# Verify key tables exist
TABLES=$(psql "$TEST_DB" -t -c "SELECT count(*) FROM information_schema.tables WHERE table_schema='public'")
echo "Tables in backup: $TABLES"

# Verify row counts
USERS=$(psql "$TEST_DB" -t -c "SELECT count(*) FROM users")
POSTS=$(psql "$TEST_DB" -t -c "SELECT count(*) FROM posts")
echo "Users: $USERS, Posts: $POSTS"

# Cleanup
dropdb "$TEST_DB"
rm /tmp/backup.sql.gz

echo "Backup verification complete!"

Disaster Recovery Runbook

## Database Disaster Recovery Runbook

### Incident: Database server unreachable

**Step 1: Assess (5 minutes)**
- [ ] Check database server status
- [ ] Check connection pool logs
- [ ] Determine if replica is available

**Step 2: Failover to replica (10 minutes)**
- [ ] Promote replica to primary:
      `aws rds promote-read-replica --db-instance-identifier mydb-replica`
- [ ] Update DNS CNAME: db.webcoderspeed.com โ†’ replica endpoint
- [ ] Verify application connects successfully
- [ ] Notify team via Slack

**Step 3: Restore from backup (if no replica)**
- [ ] Identify latest backup in S3
- [ ] Launch new RDS instance
- [ ] Restore backup
- [ ] Update connection strings
- [ ] Verify data integrity

**Step 4: Post-incident**
- [ ] Document timeline and root cause
- [ ] Create replica for new primary
- [ ] Review monitoring alerts
- [ ] Update runbook if needed

S3 Backup Lifecycle Policy

{
  "Rules": [{
    "Status": "Enabled",
    "Transitions": [
      { "Days": 30, "StorageClass": "STANDARD_IA" },
      { "Days": 90, "StorageClass": "GLACIER" },
      { "Days": 365, "StorageClass": "DEEP_ARCHIVE" }
    ],
    "Expiration": { "Days": 2555 }
  }]
}

A backup that has never been tested is not a backup โ€” it's hope. Schedule quarterly disaster recovery drills. The drill reveals what the theory misses.

Advertisement

Sanjeev Sharma

Written by

Sanjeev Sharma

Full Stack Engineer ยท E-mopro