PostgreSQL Performance Tuning — From Slow Queries to Sub-Millisecond Responses

Introduction

PostgreSQL performance tuning separates operators who understand their infrastructure from those running blind. This guide walks through real-world tuning strategies: reading execution plans, configuring memory parameters, optimizing autovacuum for write-heavy tables, and pooling connections to eliminate connection overhead. Every optimization is backed by measurable metrics.

Reading EXPLAIN ANALYZE: Seq Scan vs Index Scan vs Hash Join
work_mem: Sorting and Aggregation Performance
Checkpoint and WAL Tuning
Autovacuum Configuration for High-Write Tables
pg_stat_statements: Finding Slow Queries
Connection Overhead vs PgBouncer Pooling
shared_buffers and effective_cache_size Tuning
Checklist
Conclusion

Reading EXPLAIN ANALYZE: Seq Scan vs Index Scan vs Hash Join

EXPLAIN ANALYZE is your primary diagnostic tool. Understanding what the planner decides—and why—is foundational.

-- Enable timing and verbose output
EXPLAIN (ANALYZE, BUFFERS, TIMING, VERBOSE)
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.created_at > NOW() - INTERVAL '30 days'
  AND o.status = 'shipped'
LIMIT 100;

-- Output interpretation:
-- Seq Scan: table full scan, slow for large tables (unless filtering < 1%)
-- Index Scan: uses index, much faster if selectivity > 5%
-- Index Only Scan: all columns in index, no table access needed (fastest)
-- Hash Join: builds hash table of smaller input, fast for large joins
-- Nested Loop: one tuple at a time, slow for large result sets

-- Bad plan example: seq scan on 10M row table (fix: add index)
Seq Scan on orders (cost=0.00..450000.00 rows=500000)
  Filter: (created_at > ...) AND (status = 'shipped')
  Rows: 450000 (actual 450000)
  Buffers: shared hit=150000 read=50000

-- Good plan example: index scan with limit
Index Scan using idx_orders_created_at on orders (cost=0.28..1200.00 rows=100)
  Index Cond: (created_at > ...)
  Filter: (status = 'shipped')
  Rows: 100 (actual 100)
  Buffers: shared hit=95 read=5

-- Track actual vs estimated rows (major planning errors)
-- If actual >> estimated: add statistics, ANALYZE table
-- If actual << estimated: query is highly selective, consider partial index

ANALYZE orders;
ANALYZE customers;

work_mem: Sorting and Aggregation Performance

work_mem controls how much memory each operation uses before spilling to disk. Larger = faster, but must be multiplied by concurrent operations.

-- Current work_mem setting
SHOW work_mem;
-- Default: 4MB (way too small for modern hardware)

-- Calculation: (total_available_memory - shared_buffers) / max_connections / 2
-- Example: 32GB RAM, shared_buffers=8GB, max_connections=200
-- work_mem = (32 - 8) * 1024 / 200 / 2 = 61MB

-- Set for session (testing)
SET work_mem = '256MB';

-- Set for user (persistent across connections)
ALTER USER myapp SET work_mem = '256MB';

-- Set globally (postgresql.conf)
-- work_mem = 256MB

-- Query using sort (will spill to disk if work_mem too small)
EXPLAIN ANALYZE
SELECT customer_id, COUNT(*) as order_count, SUM(total) as lifetime_value
FROM orders
GROUP BY customer_id
ORDER BY lifetime_value DESC;

-- Check if sort used temp space
EXPLAIN ANALYZE
SELECT *
FROM (
  SELECT id, name, RANDOM() as rand_sort
  FROM users
  ORDER BY rand_sort
) t
LIMIT 10;

-- Monitor actual memory usage
SELECT
  query,
  mean_exec_time,
  calls,
  max_exec_time
FROM pg_stat_statements
WHERE query ILIKE '%ORDER BY%' OR query ILIKE '%GROUP BY%'
ORDER BY mean_exec_time DESC
LIMIT 10;

Checkpoint and WAL Tuning

Checkpoints synchronize committed data to disk. Tuning checkpoint frequency affects both write latency and crash recovery time.

-- Current checkpoint settings
SHOW checkpoint_timeout;      -- Default: 5min
SHOW checkpoint_completion_target;  -- Default: 0.9 (90%)
SHOW max_wal_size;           -- Default: 1GB
SHOW min_wal_size;           -- Default: 80MB

-- Aggressive checkpointing (lower latency, more I/O)
-- postgresql.conf settings:
checkpoint_timeout = '5min'
max_wal_size = '2GB'
checkpoint_completion_target = 0.9

-- Conservative checkpointing (higher throughput)
checkpoint_timeout = '15min'
max_wal_size = '10GB'
checkpoint_completion_target = 0.5

-- Monitor checkpoint performance
SELECT
  checkpoints_timed,
  checkpoints_req,
  checkpoint_write_time,
  checkpoint_sync_time,
  total_checkpoint_time,
  CASE WHEN checkpoints_timed + checkpoints_req = 0 THEN 0
       ELSE ROUND(100.0 * checkpoints_timed / (checkpoints_timed + checkpoints_req), 2)
  END AS timed_percent
FROM pg_stat_bgwriter;

-- If mostly time-based (checkpoints_timed high): increase timeout
-- If mostly request-based (checkpoints_req high): increase max_wal_size

-- WAL archive settings (for PITR)
archive_mode = on
archive_command = 'test ! -f /archive/%f && cp %p /archive/%f'
archive_timeout = '300'

Autovacuum Configuration for High-Write Tables

Autovacuum removes dead rows and updates statistics. Misconfigured autovacuum causes bloat and seq scans.

-- Current autovacuum settings
SHOW autovacuum;
SHOW autovacuum_naptime;          -- Default: 1min (check interval)
SHOW autovacuum_vacuum_threshold;  -- Default: 50 dead rows
SHOW autovacuum_vacuum_scale_factor; -- Default: 0.2 (20%)

-- Formula: trigger when dead_rows > threshold + (table_rows * scale_factor)
-- Table with 1M rows: trigger at 50 + (1000000 * 0.2) = 200,050 dead rows

-- Find tables needing aggressive autovacuum
SELECT
  schemaname,
  tablename,
  n_dead_tup,
  n_live_tup,
  ROUND(100 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_percent,
  last_vacuum,
  last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC;

-- Aggressive autovacuum for high-write table (events log)
-- Add to postgresql.conf or use ALTER TABLE
ALTER TABLE events SET (
  autovacuum_vacuum_scale_factor = 0.05,  -- 5% instead of 20%
  autovacuum_vacuum_cost_delay = 2,       -- faster, lower I/O
  autovacuum_vacuum_cost_limit = 1000
);

-- Lazy autovacuum for read-heavy archive tables
ALTER TABLE archived_orders SET (
  autovacuum_vacuum_scale_factor = 0.5,   -- 50%, less frequent
  autovacuum_enabled = false              -- manual vacuum only
);

-- Force autovacuum on specific table
VACUUM ANALYZE events;

-- Monitor autovacuum activity
SELECT
  schemaname,
  tablename,
  last_vacuum,
  last_autovacuum,
  vacuum_count,
  autovacuum_count
FROM pg_stat_user_tables
ORDER BY autovacuum_count DESC
LIMIT 10;

pg_stat_statements: Finding Slow Queries

pg_stat_statements aggregates query statistics. Use it to identify offenders.

-- Enable extension (requires restart)
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Reset statistics (start fresh)
SELECT pg_stat_statements_reset();

-- Top 10 slowest queries by total time
SELECT
  query,
  calls,
  total_exec_time,
  mean_exec_time,
  stddev_exec_time,
  rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- Top 10 slowest queries by mean (most consistently slow)
SELECT
  LEFT(query, 80),
  calls,
  ROUND(mean_exec_time::numeric, 2) as avg_ms,
  ROUND(stddev_exec_time::numeric, 2) as stddev_ms
FROM pg_stat_statements
WHERE calls > 100  -- filter out rare queries
ORDER BY mean_exec_time DESC
LIMIT 10;

-- Queries with worst I/O patterns
SELECT
  LEFT(query, 80),
  calls,
  ROUND(mean_exec_time::numeric, 2) as avg_ms,
  blk_read_time,
  blk_write_time
FROM pg_stat_statements
WHERE blk_read_time + blk_write_time > 0
ORDER BY blk_read_time + blk_write_time DESC
LIMIT 10;

-- Query causing most memory pressure
SELECT
  LEFT(query, 80),
  calls,
  ROUND(mean_exec_time::numeric, 2) as avg_ms
FROM pg_stat_statements
WHERE query ILIKE '%GROUP BY%' OR query ILIKE '%ORDER BY%'
ORDER BY mean_exec_time DESC
LIMIT 10;

-- Monitor statement history
SELECT
  userid::regrole,
  datname,
  query,
  calls,
  total_exec_time
FROM pg_stat_statements
JOIN pg_database ON pg_stat_statements.dbid = pg_database.oid
WHERE datname = 'myapp'
ORDER BY total_exec_time DESC;

Connection Overhead vs PgBouncer Pooling

Each database connection consumes 5-10MB RAM. Applications opening 200+ connections cause memory bloat and slow new connection acquisition.

# pgbouncer.ini configuration for production
[databases]
myapp = host=prod-db port=5432 dbname=myapp user=pgbouncer password=$PG_PASSWORD

[pgbouncer]
listen_port = 6432
listen_addr = 127.0.0.1
auth_type = md5
pool_mode = transaction  # transaction pooling for web apps (most common)
max_client_conn = 1000
default_pool_size = 25   # connections per database
min_pool_size = 10
reserve_pool_size = 5
reserve_pool_timeout = 3
max_db_connections = 200  # hard limit on database
max_user_connections = 100

# Performance tuning
server_idle_timeout = 600
server_lifetime = 3600
server_connect_timeout = 15
idle_in_transaction_session_timeout = 300

# Logging
log_connections = 1
log_disconnections = 1
log_pooler_errors = 1

// Node.js client using PgBouncer instead of direct connection
import pg from 'pg';

const pool = new pg.Pool({
  host: 'localhost',
  port: 6432,  // PgBouncer port, not direct database
  database: 'myapp',
  user: 'app_user',
  password: process.env.DB_PASSWORD,
  max: 20,  // connection pool size (smaller, PgBouncer handles multiplexing)
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 5000,
});

// Reuse connections efficiently
async function fetchUser(userId: string) {
  const client = await pool.connect();
  try {
    const result = await client.query('SELECT * FROM users WHERE id = $1', [userId]);
    return result.rows[0];
  } finally {
    client.release();  // return to pool
  }
}

// Monitor pool status
pool.on('error', (err) => {
  console.error('Unexpected pool error:', err);
});

pool.on('connect', () => {
  console.log('New connection acquired from PgBouncer');
});

// Health check
async function healthCheck() {
  const client = await pool.connect();
  try {
    await client.query('SELECT 1');
    return { healthy: true };
  } catch (err) {
    return { healthy: false, error: err.message };
  } finally {
    client.release();
  }
}

shared_buffers and effective_cache_size Tuning

shared_buffers is PostgreSQL's cache. effective_cache_size hints to the planner about OS page cache.

-- Current settings
SHOW shared_buffers;         -- Default: 128MB (too small)
SHOW effective_cache_size;   -- Default: 4GB (usually too low)

-- Tuning guidelines:
-- shared_buffers = 25% of total RAM (max 40GB on modern hardware)
-- effective_cache_size = 75% of total RAM
-- Example for 32GB RAM:
-- shared_buffers = 8GB
-- effective_cache_size = 24GB

-- PostgreSQL config (restart required)
# postgresql.conf
shared_buffers = 8GB
effective_cache_size = 24GB

-- Verify buffer hit ratio (should be > 99%)
SELECT
  sum(heap_blks_read) as heap_read,
  sum(heap_blks_hit)  as heap_hit,
  sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio
FROM pg_statio_user_tables;

-- If ratio < 99%: increase shared_buffers or add indexes

-- Index buffer hit ratio
SELECT
  sum(idx_blks_read) as idx_read,
  sum(idx_blks_hit)  as idx_hit,
  sum(idx_blks_hit) / (sum(idx_blks_hit) + sum(idx_blks_read)) as ratio
FROM pg_statio_user_indexes;

-- Monitor actual buffer pool usage
SELECT
  datname,
  count(*) as buffer_blocks,
  pg_size_pretty(count() * 8192) as cache_size
FROM pg_buffercache
WHERE usagecount > 0
GROUP BY datname
ORDER BY count() DESC;

Checklist

Conclusion

PostgreSQL performance tuning is empirical and iterative. Start with pg_stat_statements to find the real bottlenecks, use EXPLAIN ANALYZE to understand why queries are slow, and configure memory and connection pooling appropriately for your hardware. Most improvements come from index additions and tuning autovacuum, not parameter tweaking. Monitor your metrics continuously and adjust as your data growth and access patterns evolve.