Overprovisioned Infrastructure Bleeding Money — How to Right-Size Without Causing Downtime

Introduction

Overprovisioning is the engineering equivalent of buying the largest house on the market because you might have guests. It comes from two places: early decisions made "just in case" that were never revisited, and incidents that led to scaling up but never scaling back down. The result is infrastructure that costs 3–5x what the workload requires. Right-sizing isn't about cutting corners — it's about understanding your actual resource consumption and paying for what you use, not what you fear you might need.

Finding What's Overprovisioned
Fix 1: Measure Before Cutting — CloudWatch Metrics Baseline
Fix 2: Right-Sizing RDS With Zero Downtime
Fix 3: Kubernetes Resource Requests Based on Actual Usage
Fix 4: Scheduled Scaling for Predictable Traffic
Right-Sizing Checklist
Conclusion

Finding What's Overprovisioned

Right-sizing signals:

RDS (database):
  - CPU consistently < 20%: likely overprovisioned instance type
  - RAM: FreeableMemory > 50% consistently → smaller instance
  - IOPS: < 20% of provisioned → switch to gp3 or reduce provisioned IOPS
  - Storage: < 40% used → reduce storage (must snapshot + restore)

ECS / EC2 / Kubernetes:
  - CPU request/limit vs actual: running at 10% of requested → reduce requests
  - Memory: actual usage vs limit consistently low → reduce limits
  - Desired task count vs concurrent requests ratio

ElastiCache (Redis):
  - Memory used vs available: freeable memory > 60% → smaller node type
  - CPU: < 10% consistently → smaller node type

Lambda:
  - Duration < 1/4 of timeout → reduce timeout (saves money on retries)
  - Memory used < 50% of configured → reduce memory allocation

Typical savings from right-sizing:
  - RDS: move from r6g.4xlarge to r6g.xlarge → $1,500/month saved
  - ECS: reduce from 20 to 8 tasks → 60% ECS cost reduction
  - ElastiCache: r6g.large to r6g.medium → $400/month saved

Fix 1: Measure Before Cutting — CloudWatch Metrics Baseline

#!/bin/bash
# rightsizing-report.sh — generate baseline for rightsizing decisions
# Run over 30 days of data for a reliable picture

# RDS CPU utilization
aws cloudwatch get-metric-statistics \
  --namespace AWS/RDS \
  --metric-name CPUUtilization \
  --dimensions Name=DBInstanceIdentifier,Value=myapp-prod \
  --start-time "$(date -d '30 days ago' -u +%Y-%m-%dT%H:%M:%SZ)" \
  --end-time "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
  --period 3600 \
  --statistics Maximum Average \
  --query 'sort_by(Datapoints, &Timestamp)[*].{Time:Timestamp,Avg:Average,Max:Maximum}' \
  --output table

# ECS service CPU reservation vs utilization
aws cloudwatch get-metric-statistics \
  --namespace AWS/ECS \
  --metric-name CPUReservation \
  --dimensions Name=ClusterName,Value=production \
  --start-time "$(date -d '30 days ago' -u +%Y-%m-%dT%H:%M:%SZ)" \
  --end-time "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
  --period 3600 \
  --statistics Average \
  --output table

Fix 2: Right-Sizing RDS With Zero Downtime

# RDS right-sizing using blue/green deployment (zero downtime)

# Step 1: Create a read replica on the new (smaller) instance class
aws rds create-db-instance-read-replica \
  --db-instance-identifier myapp-prod-rightsized \
  --source-db-instance-identifier myapp-prod \
  --db-instance-class db.r6g.xlarge  # Down from db.r6g.4xlarge

# Step 2: Wait for replica to be in sync
aws rds wait db-instance-available \
  --db-instance-identifier myapp-prod-rightsized

# Step 3: During low-traffic window, promote the replica
aws rds promote-read-replica \
  --db-instance-identifier myapp-prod-rightsized

# Step 4: Update connection string to point to new instance
# (Or use Route53 CNAME alias for zero-change application config)

# Step 5: Monitor for 48 hours, then decommission old instance

// Monitor key metrics after right-sizing
async function postRightsizingMonitor(hours = 48) {
  const checkInterval = 15 * 60 * 1000  // Every 15 minutes
  const startTime = Date.now()

  while (Date.now() - startTime < hours * 3600 * 1000) {
    const metrics = await getRDSMetrics('myapp-prod-rightsized', '15m')

    if (metrics.cpu > 80) {
      await alerting.critical(`RDS CPU at ${metrics.cpu}% after right-sizing — may need to scale back up`)
    }

    if (metrics.freeableMemory < metrics.totalMemory * 0.1) {
      await alerting.warn(`RDS memory pressure after right-sizing: only ${(metrics.freeableMemory / 1e9).toFixed(1)}GB free`)
    }

    await sleep(checkInterval)
  }

  console.log('✅ 48-hour post-rightsizing monitoring complete — no issues')
}

Fix 3: Kubernetes Resource Requests Based on Actual Usage

# Before right-sizing (guessed values):
resources:
  requests:
    cpu: "2000m"      # 2 CPU cores requested
    memory: "4Gi"
  limits:
    cpu: "4000m"
    memory: "8Gi"
# Actual observed usage: CPU 150m, memory 512Mi
# Cluster scheduled based on requests → paying for 2 cores, using 0.15

# After right-sizing (measured values + safety margin):
resources:
  requests:
    cpu: "300m"       # 2x actual usage (150m) as headroom
    memory: "1Gi"     # 2x actual usage (512Mi) as headroom
  limits:
    cpu: "1000m"
    memory: "2Gi"
# Result: same safety, 85% fewer cluster nodes needed

# Use kubectl top to see actual usage across all pods
kubectl top pods -n production --sort-by=cpu

# VPA (Vertical Pod Autoscaler) recommendation mode:
# Analyzes usage for 30 days and recommends right-sized resource settings
kubectl apply -f - <<EOF
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: myapp-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  updatePolicy:
    updateMode: "Off"  # Recommendation only, don't auto-apply
EOF

# Read recommendations:
kubectl describe vpa myapp-vpa
# Shows: recommended CPU request: 180m, memory request: 600Mi

Fix 4: Scheduled Scaling for Predictable Traffic

// If traffic is predictable (business hours vs nights), scale dynamically
// Don't pay for weekend capacity on a B2B SaaS that's only used weekdays

// Application Auto Scaling with scheduled actions
const appAutoScaling = new ApplicationAutoScaling({ region: 'us-east-1' })

// Register ECS service as scalable target
await appAutoScaling.registerScalableTarget({
  ServiceNamespace: 'ecs',
  ResourceId: 'service/production/myapp-api',
  ScalableDimension: 'ecs:service:DesiredCount',
  MinCapacity: 2,
  MaxCapacity: 40,
})

// Scale down at night (B2B app — no users midnight–6 AM)
await appAutoScaling.putScheduledAction({
  ServiceNamespace: 'ecs',
  ResourceId: 'service/production/myapp-api',
  ScalableDimension: 'ecs:service:DesiredCount',
  ScheduledActionName: 'scale-down-night',
  Schedule: 'cron(0 5 * * ? *)',  // Midnight EST (5 UTC)
  ScalableTargetAction: { MinCapacity: 2, MaxCapacity: 4 },
})

// Scale up for business hours
await appAutoScaling.putScheduledAction({
  ServiceNamespace: 'ecs',
  ResourceId: 'service/production/myapp-api',
  ScalableDimension: 'ecs:service:DesiredCount',
  ScheduledActionName: 'scale-up-morning',
  Schedule: 'cron(0 13 * * ? *)',  // 8 AM EST (13 UTC)
  ScalableTargetAction: { MinCapacity: 8, MaxCapacity: 40 },
})

// Estimated savings: 30% by not running full capacity nights/weekends

Right-Sizing Checklist

✅ 30-day baseline of actual CPU, memory, IOPS for every component
✅ RDS right-sized using replica promotion (zero downtime)
✅ Kubernetes resource requests match actual usage × 2 safety margin
✅ Scheduled scaling for predictable workloads (nights, weekends)
✅ Post-right-sizing monitoring for 48 hours minimum
✅ Regular right-sizing review every quarter — workloads change
✅ Savings reinvested into reliability (redundancy, replicas) not wasted

Conclusion

Overprovisioned infrastructure isn't just a cost problem — it masks performance issues, hides inefficient code, and builds false confidence about headroom. Right-sizing starts with measurement: 30 days of actual usage data, not guesses. The mechanics are safe with modern tooling — RDS can be right-sized via replica promotion with zero downtime, Kubernetes VPA provides evidence-based recommendations, and scheduled scaling recovers real money from predictable off-peak windows. The goal is paying for what you use, with enough headroom for reasonable growth — not for the traffic spike you're afraid of.