- Published on
TimescaleDB for Backend Engineers — Time-Series Data Without Leaving Postgres
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
TimescaleDB extends PostgreSQL with time-series optimizations: automatic partitioning (hypertables), continuous aggregates for real-time rollups, compression for old data, and time-aware query functions. Unlike standalone time-series databases, TimescaleDB keeps analytics alongside your relational data. This guide covers production deployment, from hypertable design to retention policies.
- Hypertable Creation and Chunk Interval Selection
- Continuous Aggregates: Real-Time Rollups
- Compression Policies: Segment By and Order By
- Retention Policies: Automatic Data Expiry
- time_bucket() for Granular Aggregation
- Comparing TimescaleDB vs InfluxDB vs Raw Postgres Partitioning
- Node.js TimescaleDB Integration
- Checklist
- Conclusion
Hypertable Creation and Chunk Interval Selection
Hypertables are the foundation of TimescaleDB. They automatically partition time-series data into chunks (like PostgreSQL partitions but optimized).
-- Install TimescaleDB extension
CREATE EXTENSION IF NOT EXISTS timescaledb;
-- Create regular table (will become hypertable)
CREATE TABLE metrics (
time TIMESTAMPTZ NOT NULL,
metric_name TEXT NOT NULL,
host_id BIGINT NOT NULL,
value FLOAT8 NOT NULL,
tags JSONB
);
-- Convert to hypertable (must have TIMESTAMPTZ column)
SELECT create_hypertable('metrics', by_time_bucket => 'time');
-- Or with custom chunk interval
SELECT create_hypertable(
'metrics',
by_time_bucket => 'time',
if_not_exists => TRUE,
chunk_time_interval => INTERVAL '1 day' -- critical tuning parameter
);
-- Chunk interval guidance:
-- - < 100K rows/chunk: 1 hour
-- - 100K-1M rows/chunk: 1 day (typical)
-- - 1M-10M rows/chunk: 7 days
-- - > 10M rows/chunk: 30 days
-- Add constraints and indexes (auto-inherited by chunks)
CREATE INDEX idx_metrics_host_time ON metrics (host_id, time DESC);
CREATE INDEX idx_metrics_name_time ON metrics (metric_name, time DESC);
-- View hypertable info
SELECT hypertable_name, owner FROM timescaledb_information.hypertables;
-- View chunks created
SELECT show_chunks('metrics');
-- Output: metrics_1, metrics_2, metrics_3, ...
-- Chunk stats
SELECT
chunk_name,
chunk_num_tuples,
chunk_compressed_before_size,
chunk_compressed_after_size
FROM timescaledb_information.chunks
WHERE hypertable_name = 'metrics';
-- Reorder chunks for better compression
ALTER TABLE metrics SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'host_id,metric_name',
timescaledb.compress_orderby = 'time DESC'
);
Continuous Aggregates: Real-Time Rollups
Continuous aggregates pre-compute aggregations as data arrives, enabling instant queries.
-- Pre-aggregated hourly metrics
CREATE MATERIALIZED VIEW metrics_hourly
WITH (timescaledb.continuous) AS
SELECT
time_bucket(INTERVAL '1 hour', time) as bucket,
metric_name,
host_id,
count() as event_count,
avg(value) as avg_value,
max(value) as max_value,
min(value) as min_value,
stddev(value) as stddev_value
FROM metrics
GROUP BY bucket, metric_name, host_id;
-- Create index on continuous aggregate
CREATE INDEX ON metrics_hourly (metric_name, bucket DESC);
-- Refresh policy: auto-refresh materialized view
SELECT add_continuous_aggregate_policy('metrics_hourly',
start_offset => INTERVAL '7 days',
end_offset => INTERVAL '1 hour',
schedule_interval => INTERVAL '1 hour'
);
-- Query pre-aggregated data (instant, no full scan)
SELECT
bucket,
metric_name,
host_id,
avg_value,
max_value
FROM metrics_hourly
WHERE bucket > NOW() - INTERVAL '30 days'
AND metric_name = 'cpu_usage'
ORDER BY bucket DESC
LIMIT 100;
-- Multi-level aggregation (hourly → daily)
CREATE MATERIALIZED VIEW metrics_daily
WITH (timescaledb.continuous) AS
SELECT
time_bucket(INTERVAL '1 day', bucket) as day,
metric_name,
host_id,
sum(event_count) as event_count,
avg(avg_value) as avg_value,
max(max_value) as max_value,
min(min_value) as min_value
FROM metrics_hourly
GROUP BY day, metric_name, host_id;
-- View aggregate invalidation log
SELECT * FROM _timescaledb_internal.continuous_aggs_invalidation_threshold;
-- Manual refresh (if needed)
CALL refresh_continuous_aggregate('metrics_hourly', INTERVAL '7 days', NULL);
Compression Policies: Segment By and Order By
Compression reduces storage by 90%+ for old data. TimescaleDB uses columnar encoding.
-- Enable compression on hypertable
ALTER TABLE metrics SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'host_id,metric_name',
timescaledb.compress_orderby = 'time DESC'
);
-- COMPRESS_SEGMENTBY: columns that define segments
-- Rationale: group similar data for better compression
-- - Don't include high-cardinality columns (prevents compression)
-- - Do include columns frequently filtered in queries
-- COMPRESS_ORDERBY: column order for compression
-- Rationale: time DESC ensures recent data accessed first
-- Add compression policy: compress chunks older than 7 days
SELECT add_compression_policy('metrics',
compress_after => INTERVAL '7 days'
);
-- View compression status
SELECT
chunk_name,
is_compressed,
chunk_compressed_before_size,
chunk_compressed_after_size,
ROUND(
100.0 * (chunk_compressed_before_size - chunk_compressed_after_size) /
chunk_compressed_before_size, 2
) as compression_ratio
FROM timescaledb_information.chunks
WHERE hypertable_name = 'metrics'
ORDER BY chunk_compressed_after_size DESC;
-- Example output:
-- chunk_name | is_compressed | before_size | after_size | ratio
-- metrics_1 | t | 100MB | 5MB | 95.00%
-- metrics_2 | f | 80MB | - | NULL
-- Manual compression (for testing)
SELECT compress_chunk('_timescaledb_internal.compress_hyper_1_1_chunk');
-- Decompression (if needed for updates)
SELECT decompress_chunk('_timescaledb_internal.compress_hyper_1_1_chunk');
-- Best practices:
-- - Compress older than 7+ days (uncompressed data updated frequently)
-- - Avoid compressing hot chunks (recent data)
-- - Monitor compression performance
Retention Policies: Automatic Data Expiry
TimescaleDB automatically deletes old data using retention policies.
-- Add retention policy: keep 90 days, delete older
SELECT add_retention_policy('metrics',
drop_after => INTERVAL '90 days'
);
-- Modify retention: keep 1 year
SELECT remove_retention_policy('metrics', if_exists => TRUE);
SELECT add_retention_policy('metrics',
drop_after => INTERVAL '1 year',
schedule_interval => INTERVAL '1 day'
);
-- View retention policies
SELECT job_id, hypertable_name, drop_after, schedule_interval
FROM timescaledb_information.jobs
WHERE proc_name = 'policy_retention';
-- Disable retention (if needed)
SELECT remove_retention_policy('metrics', if_exists => TRUE);
-- Monitor deleted chunks
SELECT * FROM timescaledb_information.chunks
WHERE hypertable_name = 'metrics' AND is_compressed IS NULL;
-- NULL is_compressed = chunk marked for deletion
-- Manual deletion (for testing)
SELECT drop_chunks('metrics', INTERVAL '60 days');
-- Query retention settings
SELECT
job_id,
hypertable_name,
drop_after,
COALESCE(schedule_interval, 'manual') as schedule
FROM timescaledb_information.jobs
WHERE proc_name = 'policy_retention'
ORDER BY hypertable_name;
time_bucket() for Granular Aggregation
time_bucket() is TimescaleDB's secret weapon for time-series aggregation.
-- Group by custom time bucket (1 minute, 5 minutes, etc.)
SELECT
time_bucket(INTERVAL '5 minutes', time) as bucket,
host_id,
avg(value) as avg_value,
max(value) as max_value,
min(value) as min_value
FROM metrics
WHERE time > NOW() - INTERVAL '1 day'
AND host_id = 123
GROUP BY bucket, host_id
ORDER BY bucket DESC;
-- Non-uniform bucketing (e.g., business hours vs off-hours)
SELECT
CASE
WHEN EXTRACT(HOUR FROM time) BETWEEN 9 AND 17
THEN time_bucket(INTERVAL '15 minutes', time)
ELSE time_bucket(INTERVAL '1 hour', time)
END as bucket,
avg(value) as avg_value
FROM metrics
WHERE time > NOW() - INTERVAL '7 days'
GROUP BY bucket
ORDER BY bucket DESC;
-- Named time buckets (for readability)
WITH hourly_data AS (
SELECT
time_bucket(INTERVAL '1 hour', time) as hour,
metric_name,
avg(value) as avg_value,
max(value) as peak_value
FROM metrics
WHERE time > NOW() - INTERVAL '30 days'
GROUP BY hour, metric_name
)
SELECT
to_char(hour, 'YYYY-MM-DD HH24:00') as hour_str,
metric_name,
ROUND(avg_value, 2) as avg_value,
peak_value
FROM hourly_data
ORDER BY hour DESC
LIMIT 100;
-- time_bucket_gapfill(): fill missing time buckets
SELECT
time_bucket_gapfill(INTERVAL '1 hour', time) as bucket,
host_id,
avg(value) as avg_value
FROM metrics
WHERE time > NOW() - INTERVAL '7 days'
AND host_id = 123
GROUP BY bucket, host_id
ORDER BY bucket DESC;
-- Output: fills gaps with NULL if no data for bucket
Comparing TimescaleDB vs InfluxDB vs Raw Postgres Partitioning
Understanding tradeoffs helps choose the right tool.
-- Raw Postgres partitioning (manual work)
CREATE TABLE metrics_raw (
time TIMESTAMPTZ NOT NULL,
value FLOAT8 NOT NULL,
host_id BIGINT NOT NULL
) PARTITION BY RANGE (time);
CREATE TABLE metrics_raw_2025_01 PARTITION OF metrics_raw
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
CREATE TABLE metrics_raw_2025_02 PARTITION OF metrics_raw
FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');
-- Pros: native Postgres, no extension
-- Cons: manual partition creation, no compression, no continuous aggregates
-- TimescaleDB (automatic partitioning)
CREATE TABLE metrics_ts (
time TIMESTAMPTZ NOT NULL,
value FLOAT8 NOT NULL,
host_id BIGINT NOT NULL
);
SELECT create_hypertable('metrics_ts', 'time');
-- Pros: automatic chunks, compression, continuous aggregates, JSONB support
-- Cons: requires extension, slightly slower single-row inserts
-- InfluxDB (dedicated time-series DB)
-- INSERT INTO bucket measurement,tag=value field=100 timestamp
-- SELECT * FROM measurement WHERE time > '2025-01-01'
-- Pros: optimized for high-cardinality, Flux query language
-- Cons: separate database, no ACID, no complex joins
Node.js TimescaleDB Integration
import pg from 'pg';
import { sql } from 'drizzle-orm';
import { db } from './db';
// Insert metrics (high throughput, optimized for TimescaleDB)
async function insertMetrics(
metrics: Array<{ time: Date; name: string; hostId: number; value: number }>
) {
const client = await db.getConnection();
try {
// Batch insert for throughput
const values = metrics
.map(m => `('${m.time.toISOString()}','${m.name}',${m.hostId},${m.value})`)
.join(',');
await client.query(`
INSERT INTO metrics (time, metric_name, host_id, value)
VALUES ${values}
`);
} finally {
client.release();
}
}
// Query with time_bucket
async function getHourlyMetrics(hostId: number, metricName: string, days: number = 7) {
const result = await db.query(sql`
SELECT
time_bucket(INTERVAL '1 hour', time) as bucket,
avg(value) as avg_value,
max(value) as max_value,
min(value) as min_value,
count(*) as count
FROM metrics
WHERE host_id = ${hostId}
AND metric_name = ${metricName}
AND time > now() - INTERVAL '${days} days'
GROUP BY bucket
ORDER BY bucket DESC
`);
return result.rows;
}
// Query continuous aggregate
async function getPreAggregated(metricName: string, days: number = 30) {
const result = await db.query(sql`
SELECT
bucket,
host_id,
avg_value,
max_value,
min_value
FROM metrics_hourly
WHERE metric_name = ${metricName}
AND bucket > now() - INTERVAL '${days} days'
ORDER BY bucket DESC
LIMIT 100
`);
return result.rows;
}
// Streaming large result sets
async function streamMetrics(hostId: number, callback: (rows: any[]) => Promise<void>) {
const client = await db.getConnection();
try {
const query = client.query(
sql`SELECT * FROM metrics WHERE host_id = ${hostId} ORDER BY time DESC`
);
query.on('row', async (row) => {
await callback([row]);
});
await new Promise((resolve, reject) => {
query.on('end', resolve);
query.on('error', reject);
});
} finally {
client.release();
}
}
Checklist
- TimescaleDB extension installed and verified
- Hypertable created with appropriate chunk interval
- Indexes added on commonly filtered columns
- Continuous aggregates created for frequent queries
- Compression policy enabled (compress_after >= 7 days)
- Retention policy configured (data deletion schedule)
- time_bucket() queries validated for performance
- Multi-level aggregation (hourly → daily → monthly) designed
- Compression ratio monitored (should be > 80%)
- Retention jobs scheduled and monitored
- time_bucket_gapfill() used for gap handling
- Node.js client configured with connection pooling
- Batch inserts implemented for throughput
- Continuous aggregate refresh policy tuned
Conclusion
TimescaleDB transforms PostgreSQL into a powerful time-series database while preserving relational capabilities. Use automatic hypertables for transparent partitioning, continuous aggregates for instant query results, and compression policies to manage storage. With time_bucket() for flexible aggregation and retention policies for automatic cleanup, you can handle billions of time-series events within PostgreSQL without additional infrastructure. The result is a simpler architecture, fewer databases to maintain, and access to rich time-series analytics alongside your operational data.