Published on

PostgreSQL Indexing Strategies — B-Tree, GIN, BRIN, and Partial Indexes in Production

Authors

Introduction

PostgreSQL indexing is the most critical lever for query performance at scale. Many teams struggle with slow queries because they either lack indexes entirely or maintain bloated, unused indexes that slow down writes without speeding reads. This post covers production patterns for index selection, monitoring, and maintenance.

B-Tree Indexes for Equality and Range Queries

B-Tree indexes are PostgreSQL's default and handle 90% of use cases: equality lookups (WHERE id = ?) and range queries (WHERE created_at > ?).

-- Basic B-Tree index for a single column
CREATE INDEX CONCURRENTLY idx_users_email ON users(email);

-- Multi-column index for filtering and sorting
CREATE INDEX CONCURRENTLY idx_orders_user_date
  ON orders(user_id, created_at DESC);

-- Check index effectiveness
EXPLAIN ANALYZE
SELECT * FROM users WHERE email = 'alice@example.com';

The multi-column index above helps two types of queries:

  1. Filtering by user_id then created_at
  2. Pure user_id filter (PostgreSQL uses index prefix)

For composite indexes, column order matters: filter columns first, then sort columns.

GIN Indexes for JSONB and Arrays

GIN (Generalized Inverted Index) accelerates searches within JSONB documents and array columns.

-- JSONB containment queries
CREATE INDEX CONCURRENTLY idx_events_data_gin
  ON events USING gin(data);

-- Query: find all events where data contains key 'user_id'
EXPLAIN ANALYZE
SELECT * FROM events
WHERE data @> '{"user_id": 123}';

-- Array overlap queries
CREATE INDEX CONCURRENTLY idx_tags_gin
  ON articles USING gin(tags);

-- Query: articles with ANY of these tags
SELECT * FROM articles
WHERE tags && ARRAY['postgres', 'performance'];

GIN trades write performance for read speed on complex data types. Monitor write latency when adding GIN indexes to hot tables.

BRIN Indexes for Time-Series Data

BRIN (Block Range Index) provides exceptional compression for naturally ordered data like timestamps and sensor readings. BRIN indexes are 100x smaller than B-Tree on time-series.

-- BRIN index on time-ordered metric data
CREATE INDEX CONCURRENTLY idx_metrics_time_brin
  ON metrics USING brin(recorded_at);

-- Query: last 7 days of metrics
EXPLAIN ANALYZE
SELECT * FROM metrics
WHERE recorded_at > NOW() - INTERVAL '7 days'
ORDER BY recorded_at DESC;

-- BRIN index on naturally ordered ID columns
CREATE INDEX CONCURRENTLY idx_events_id_brin
  ON events USING brin(id) WITH (pages_per_range = 256);

BRIN works best when data is physically ordered by the indexed column. Insert in time order for optimal performance. The pages_per_range parameter (default 128) controls memory usage vs. query speed.

Partial Indexes for Filtered Queries

Partial indexes index only rows matching a WHERE condition, reducing size and write overhead.

-- Index only active users for login queries
CREATE INDEX CONCURRENTLY idx_users_email_active
  ON users(email)
  WHERE deleted_at IS NULL AND active = true;

-- Query must match the WHERE condition to use the index
SELECT * FROM users
WHERE email = 'alice@example.com' AND active = true;

-- Partial index on expensive status
CREATE INDEX CONCURRENTLY idx_orders_pending
  ON orders(user_id, created_at)
  WHERE status = 'pending';

-- Useful for event tables with many archived rows
CREATE INDEX CONCURRENTLY idx_events_recent
  ON events(user_id, created_at DESC)
  WHERE created_at > NOW() - INTERVAL '90 days';

Partial indexes shine on tables where a small percentage of rows are "hot" (frequently queried).

Covering Indexes and Index-Only Scans

Including non-key columns in indexes enables index-only scans where PostgreSQL never touches the heap table.

-- Covering index includes columns for the SELECT list
CREATE INDEX CONCURRENTLY idx_users_covering
  ON users(email)
  INCLUDE (id, name, role);

-- Index-only scan: planner reads everything from index
EXPLAIN ANALYZE
SELECT id, name, role FROM users WHERE email = 'alice@example.com';

-- Check visibility map hit ratio
SELECT
  schemaname, tablename,
  idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan DESC;

Covering indexes increase index size but reduce heap I/O. Use on frequently selected column sets where reads outnumber writes.

Identifying and Removing Unused Indexes

Bloated indexes slow down writes. Use pg_stat_user_indexes to find unused indexes:

-- Find indexes never used since server startup
SELECT
  schemaname, tablename, indexname, idx_scan, idx_tup_read,
  pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexname NOT LIKE '%_pkey'
ORDER BY pg_relation_size(indexrelid) DESC;

-- Indexes with few scans relative to size
SELECT
  schemaname, indexname,
  idx_scan, idx_tup_read,
  pg_size_pretty(pg_relation_size(indexrelid)) AS size,
  ROUND(100.0 * idx_tup_read / NULLIF(idx_scan, 0)) AS tup_read_ratio
FROM pg_stat_user_indexes
WHERE idx_scan > 0
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 20;

-- Safe removal (requires exclusive lock)
DROP INDEX CONCURRENTLY idx_old_unused_index;

Run these queries on production read replicas to avoid locking the primary.

Index Bloat and REINDEX CONCURRENTLY

B-Tree indexes grow with update/delete activity. Reindex to reclaim space without downtime:

#!/bin/bash
# Monitor index bloat estimate
psql -d mydb -c "
SELECT
  schemaname, indexname,
  ROUND(100.0 * (pg_relation_size(indexrelid) -
    pg_relation_size(indexrelid, 'main'))
    / pg_relation_size(indexrelid)) AS bloat_ratio,
  pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE pg_relation_size(indexrelid) > 10 * 1024 * 1024
ORDER BY pg_relation_size(indexrelid) DESC;
"
-- Reindex without blocking reads/writes (PostgreSQL 12+)
REINDEX INDEX CONCURRENTLY idx_users_email;

-- Reindex all indexes on a table concurrently
REINDEX TABLE CONCURRENTLY orders;

-- Monitor reindex progress (connection must stay open)
SELECT
  pid, query, query_start,
  EXTRACT(EPOCH FROM (NOW() - query_start)) AS duration_sec
FROM pg_stat_activity
WHERE query ILIKE '%reindex%';

Schedule reindex maintenance during predictable traffic patterns, typically after batch delete operations.

EXPLAIN ANALYZE for Index Verification

Read EXPLAIN ANALYZE output to verify indexes are used correctly:

EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 42 AND created_at > NOW() - INTERVAL '30 days';

-- Output interpretation:
-- Seq Scan: table scan with predicate filter (index NOT used)
-- Index Scan: index used, heap accessed for all columns
-- Index Only Scan: index used, no heap access (most efficient)
-- Rows vs Actual: high deviation means missing ANALYZE

PostgreSQL Indexing Checklist

  • Index primary/foreign key columns (usually automatic with constraints)
  • Index columns used in WHERE, JOIN ON, and ORDER BY clauses
  • Use multi-column indexes for common query patterns
  • Apply partial indexes to reduce index size on filtered queries
  • Monitor pg_stat_user_indexes monthly for unused indexes
  • Check index bloat after high-churn operations and REINDEX CONCURRENTLY
  • Run ANALYZE after significant data changes (bulk load/delete)
  • Use covering indexes for hot read-only columns
  • Prefer GIN for JSONB, BRIN for time-series data
  • Measure impact with EXPLAIN ANALYZE before/after indexing

Conclusion

PostgreSQL indexing requires thoughtful strategy, not just adding indexes everywhere. Start with query analysis using EXPLAIN ANALYZE, choose the right index type (B-Tree, GIN, BRIN), and monitor with pg_stat_user_indexes to eliminate waste. The best index is one that accelerates critical queries without slowing writes. By combining B-Tree for general queries, GIN for complex data types, BRIN for time-series, and partial indexes for filtered workloads, you'll build a lean, performant index portfolio that scales with your data.