Published on

TimescaleDB for Backend Engineers — Time-Series Data Without Leaving Postgres

Authors

Introduction

TimescaleDB extends PostgreSQL with time-series optimizations: automatic partitioning (hypertables), continuous aggregates for real-time rollups, compression for old data, and time-aware query functions. Unlike standalone time-series databases, TimescaleDB keeps analytics alongside your relational data. This guide covers production deployment, from hypertable design to retention policies.

Hypertable Creation and Chunk Interval Selection

Hypertables are the foundation of TimescaleDB. They automatically partition time-series data into chunks (like PostgreSQL partitions but optimized).

-- Install TimescaleDB extension
CREATE EXTENSION IF NOT EXISTS timescaledb;

-- Create regular table (will become hypertable)
CREATE TABLE metrics (
  time TIMESTAMPTZ NOT NULL,
  metric_name TEXT NOT NULL,
  host_id BIGINT NOT NULL,
  value FLOAT8 NOT NULL,
  tags JSONB
);

-- Convert to hypertable (must have TIMESTAMPTZ column)
SELECT create_hypertable('metrics', by_time_bucket => 'time');

-- Or with custom chunk interval
SELECT create_hypertable(
  'metrics',
  by_time_bucket => 'time',
  if_not_exists => TRUE,
  chunk_time_interval => INTERVAL '1 day'  -- critical tuning parameter
);

-- Chunk interval guidance:
-- - < 100K rows/chunk: 1 hour
-- - 100K-1M rows/chunk: 1 day (typical)
-- - 1M-10M rows/chunk: 7 days
-- - > 10M rows/chunk: 30 days

-- Add constraints and indexes (auto-inherited by chunks)
CREATE INDEX idx_metrics_host_time ON metrics (host_id, time DESC);
CREATE INDEX idx_metrics_name_time ON metrics (metric_name, time DESC);

-- View hypertable info
SELECT hypertable_name, owner FROM timescaledb_information.hypertables;

-- View chunks created
SELECT show_chunks('metrics');
-- Output: metrics_1, metrics_2, metrics_3, ...

-- Chunk stats
SELECT
  chunk_name,
  chunk_num_tuples,
  chunk_compressed_before_size,
  chunk_compressed_after_size
FROM timescaledb_information.chunks
WHERE hypertable_name = 'metrics';

-- Reorder chunks for better compression
ALTER TABLE metrics SET (
  timescaledb.compress,
  timescaledb.compress_segmentby = 'host_id,metric_name',
  timescaledb.compress_orderby = 'time DESC'
);

Continuous Aggregates: Real-Time Rollups

Continuous aggregates pre-compute aggregations as data arrives, enabling instant queries.

-- Pre-aggregated hourly metrics
CREATE MATERIALIZED VIEW metrics_hourly
WITH (timescaledb.continuous) AS
SELECT
  time_bucket(INTERVAL '1 hour', time) as bucket,
  metric_name,
  host_id,
  count() as event_count,
  avg(value) as avg_value,
  max(value) as max_value,
  min(value) as min_value,
  stddev(value) as stddev_value
FROM metrics
GROUP BY bucket, metric_name, host_id;

-- Create index on continuous aggregate
CREATE INDEX ON metrics_hourly (metric_name, bucket DESC);

-- Refresh policy: auto-refresh materialized view
SELECT add_continuous_aggregate_policy('metrics_hourly',
  start_offset => INTERVAL '7 days',
  end_offset => INTERVAL '1 hour',
  schedule_interval => INTERVAL '1 hour'
);

-- Query pre-aggregated data (instant, no full scan)
SELECT
  bucket,
  metric_name,
  host_id,
  avg_value,
  max_value
FROM metrics_hourly
WHERE bucket > NOW() - INTERVAL '30 days'
  AND metric_name = 'cpu_usage'
ORDER BY bucket DESC
LIMIT 100;

-- Multi-level aggregation (hourly → daily)
CREATE MATERIALIZED VIEW metrics_daily
WITH (timescaledb.continuous) AS
SELECT
  time_bucket(INTERVAL '1 day', bucket) as day,
  metric_name,
  host_id,
  sum(event_count) as event_count,
  avg(avg_value) as avg_value,
  max(max_value) as max_value,
  min(min_value) as min_value
FROM metrics_hourly
GROUP BY day, metric_name, host_id;

-- View aggregate invalidation log
SELECT * FROM _timescaledb_internal.continuous_aggs_invalidation_threshold;

-- Manual refresh (if needed)
CALL refresh_continuous_aggregate('metrics_hourly', INTERVAL '7 days', NULL);

Compression Policies: Segment By and Order By

Compression reduces storage by 90%+ for old data. TimescaleDB uses columnar encoding.

-- Enable compression on hypertable
ALTER TABLE metrics SET (
  timescaledb.compress,
  timescaledb.compress_segmentby = 'host_id,metric_name',
  timescaledb.compress_orderby = 'time DESC'
);

-- COMPRESS_SEGMENTBY: columns that define segments
-- Rationale: group similar data for better compression
-- - Don't include high-cardinality columns (prevents compression)
-- - Do include columns frequently filtered in queries

-- COMPRESS_ORDERBY: column order for compression
-- Rationale: time DESC ensures recent data accessed first

-- Add compression policy: compress chunks older than 7 days
SELECT add_compression_policy('metrics',
  compress_after => INTERVAL '7 days'
);

-- View compression status
SELECT
  chunk_name,
  is_compressed,
  chunk_compressed_before_size,
  chunk_compressed_after_size,
  ROUND(
    100.0 * (chunk_compressed_before_size - chunk_compressed_after_size) /
    chunk_compressed_before_size, 2
  ) as compression_ratio
FROM timescaledb_information.chunks
WHERE hypertable_name = 'metrics'
ORDER BY chunk_compressed_after_size DESC;

-- Example output:
-- chunk_name | is_compressed | before_size | after_size | ratio
-- metrics_1  | t             | 100MB       | 5MB        | 95.00%
-- metrics_2  | f             | 80MB        | -          | NULL

-- Manual compression (for testing)
SELECT compress_chunk('_timescaledb_internal.compress_hyper_1_1_chunk');

-- Decompression (if needed for updates)
SELECT decompress_chunk('_timescaledb_internal.compress_hyper_1_1_chunk');

-- Best practices:
-- - Compress older than 7+ days (uncompressed data updated frequently)
-- - Avoid compressing hot chunks (recent data)
-- - Monitor compression performance

Retention Policies: Automatic Data Expiry

TimescaleDB automatically deletes old data using retention policies.

-- Add retention policy: keep 90 days, delete older
SELECT add_retention_policy('metrics',
  drop_after => INTERVAL '90 days'
);

-- Modify retention: keep 1 year
SELECT remove_retention_policy('metrics', if_exists => TRUE);
SELECT add_retention_policy('metrics',
  drop_after => INTERVAL '1 year',
  schedule_interval => INTERVAL '1 day'
);

-- View retention policies
SELECT job_id, hypertable_name, drop_after, schedule_interval
FROM timescaledb_information.jobs
WHERE proc_name = 'policy_retention';

-- Disable retention (if needed)
SELECT remove_retention_policy('metrics', if_exists => TRUE);

-- Monitor deleted chunks
SELECT * FROM timescaledb_information.chunks
WHERE hypertable_name = 'metrics' AND is_compressed IS NULL;
-- NULL is_compressed = chunk marked for deletion

-- Manual deletion (for testing)
SELECT drop_chunks('metrics', INTERVAL '60 days');

-- Query retention settings
SELECT
  job_id,
  hypertable_name,
  drop_after,
  COALESCE(schedule_interval, 'manual') as schedule
FROM timescaledb_information.jobs
WHERE proc_name = 'policy_retention'
ORDER BY hypertable_name;

time_bucket() for Granular Aggregation

time_bucket() is TimescaleDB's secret weapon for time-series aggregation.

-- Group by custom time bucket (1 minute, 5 minutes, etc.)
SELECT
  time_bucket(INTERVAL '5 minutes', time) as bucket,
  host_id,
  avg(value) as avg_value,
  max(value) as max_value,
  min(value) as min_value
FROM metrics
WHERE time > NOW() - INTERVAL '1 day'
  AND host_id = 123
GROUP BY bucket, host_id
ORDER BY bucket DESC;

-- Non-uniform bucketing (e.g., business hours vs off-hours)
SELECT
  CASE
    WHEN EXTRACT(HOUR FROM time) BETWEEN 9 AND 17
    THEN time_bucket(INTERVAL '15 minutes', time)
    ELSE time_bucket(INTERVAL '1 hour', time)
  END as bucket,
  avg(value) as avg_value
FROM metrics
WHERE time > NOW() - INTERVAL '7 days'
GROUP BY bucket
ORDER BY bucket DESC;

-- Named time buckets (for readability)
WITH hourly_data AS (
  SELECT
    time_bucket(INTERVAL '1 hour', time) as hour,
    metric_name,
    avg(value) as avg_value,
    max(value) as peak_value
  FROM metrics
  WHERE time > NOW() - INTERVAL '30 days'
  GROUP BY hour, metric_name
)
SELECT
  to_char(hour, 'YYYY-MM-DD HH24:00') as hour_str,
  metric_name,
  ROUND(avg_value, 2) as avg_value,
  peak_value
FROM hourly_data
ORDER BY hour DESC
LIMIT 100;

-- time_bucket_gapfill(): fill missing time buckets
SELECT
  time_bucket_gapfill(INTERVAL '1 hour', time) as bucket,
  host_id,
  avg(value) as avg_value
FROM metrics
WHERE time > NOW() - INTERVAL '7 days'
  AND host_id = 123
GROUP BY bucket, host_id
ORDER BY bucket DESC;
-- Output: fills gaps with NULL if no data for bucket

Comparing TimescaleDB vs InfluxDB vs Raw Postgres Partitioning

Understanding tradeoffs helps choose the right tool.

-- Raw Postgres partitioning (manual work)
CREATE TABLE metrics_raw (
  time TIMESTAMPTZ NOT NULL,
  value FLOAT8 NOT NULL,
  host_id BIGINT NOT NULL
) PARTITION BY RANGE (time);

CREATE TABLE metrics_raw_2025_01 PARTITION OF metrics_raw
  FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

CREATE TABLE metrics_raw_2025_02 PARTITION OF metrics_raw
  FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');

-- Pros: native Postgres, no extension
-- Cons: manual partition creation, no compression, no continuous aggregates

-- TimescaleDB (automatic partitioning)
CREATE TABLE metrics_ts (
  time TIMESTAMPTZ NOT NULL,
  value FLOAT8 NOT NULL,
  host_id BIGINT NOT NULL
);
SELECT create_hypertable('metrics_ts', 'time');

-- Pros: automatic chunks, compression, continuous aggregates, JSONB support
-- Cons: requires extension, slightly slower single-row inserts

-- InfluxDB (dedicated time-series DB)
-- INSERT INTO bucket measurement,tag=value field=100 timestamp
-- SELECT * FROM measurement WHERE time > '2025-01-01'

-- Pros: optimized for high-cardinality, Flux query language
-- Cons: separate database, no ACID, no complex joins

Node.js TimescaleDB Integration

import pg from 'pg';
import { sql } from 'drizzle-orm';
import { db } from './db';

// Insert metrics (high throughput, optimized for TimescaleDB)
async function insertMetrics(
  metrics: Array<{ time: Date; name: string; hostId: number; value: number }>
) {
  const client = await db.getConnection();
  try {
    // Batch insert for throughput
    const values = metrics
      .map(m => `('${m.time.toISOString()}','${m.name}',${m.hostId},${m.value})`)
      .join(',');

    await client.query(`
      INSERT INTO metrics (time, metric_name, host_id, value)
      VALUES ${values}
    `);
  } finally {
    client.release();
  }
}

// Query with time_bucket
async function getHourlyMetrics(hostId: number, metricName: string, days: number = 7) {
  const result = await db.query(sql`
    SELECT
      time_bucket(INTERVAL '1 hour', time) as bucket,
      avg(value) as avg_value,
      max(value) as max_value,
      min(value) as min_value,
      count(*) as count
    FROM metrics
    WHERE host_id = ${hostId}
      AND metric_name = ${metricName}
      AND time > now() - INTERVAL '${days} days'
    GROUP BY bucket
    ORDER BY bucket DESC
  `);

  return result.rows;
}

// Query continuous aggregate
async function getPreAggregated(metricName: string, days: number = 30) {
  const result = await db.query(sql`
    SELECT
      bucket,
      host_id,
      avg_value,
      max_value,
      min_value
    FROM metrics_hourly
    WHERE metric_name = ${metricName}
      AND bucket > now() - INTERVAL '${days} days'
    ORDER BY bucket DESC
    LIMIT 100
  `);

  return result.rows;
}

// Streaming large result sets
async function streamMetrics(hostId: number, callback: (rows: any[]) => Promise<void>) {
  const client = await db.getConnection();
  try {
    const query = client.query(
      sql`SELECT * FROM metrics WHERE host_id = ${hostId} ORDER BY time DESC`
    );

    query.on('row', async (row) => {
      await callback([row]);
    });

    await new Promise((resolve, reject) => {
      query.on('end', resolve);
      query.on('error', reject);
    });
  } finally {
    client.release();
  }
}

Checklist

  • TimescaleDB extension installed and verified
  • Hypertable created with appropriate chunk interval
  • Indexes added on commonly filtered columns
  • Continuous aggregates created for frequent queries
  • Compression policy enabled (compress_after >= 7 days)
  • Retention policy configured (data deletion schedule)
  • time_bucket() queries validated for performance
  • Multi-level aggregation (hourly → daily → monthly) designed
  • Compression ratio monitored (should be > 80%)
  • Retention jobs scheduled and monitored
  • time_bucket_gapfill() used for gap handling
  • Node.js client configured with connection pooling
  • Batch inserts implemented for throughput
  • Continuous aggregate refresh policy tuned

Conclusion

TimescaleDB transforms PostgreSQL into a powerful time-series database while preserving relational capabilities. Use automatic hypertables for transparent partitioning, continuous aggregates for instant query results, and compression policies to manage storage. With time_bucket() for flexible aggregation and retention policies for automatic cleanup, you can handle billions of time-series events within PostgreSQL without additional infrastructure. The result is a simpler architecture, fewer databases to maintain, and access to rich time-series analytics alongside your operational data.