Published on

ClickHouse for Backend Engineers — Real-Time Analytics Without the Data Warehouse Complexity

Authors

Introduction

ClickHouse is a columnar OLAP database that handles billions of rows with sub-second query latency. Unlike Postgres, optimized for OLTP row access, ClickHouse is built for analytics: aggregations, time-series, and complex range queries. This guide covers production ClickHouse architectures, from MergeTree engines to dual-write patterns that keep your OLTP and analytics layers in sync.

When ClickHouse Beats PostgreSQL for Analytics

ClickHouse outperforms Postgres on analytical workloads by 10-100x. Understand when the switch makes sense.

-- Query: Aggregations over billions of rows (ClickHouse's sweet spot)
-- Postgres: ~60 seconds for 1 billion rows
-- ClickHouse: ~100ms for 1 billion rows

-- Example: Sum events by hour (ClickHouse)
SELECT
  toStartOfHour(event_time) as hour,
  count() as event_count,
  sum(value) as total_value,
  avg(value) as avg_value,
  quantile(0.95)(latency_ms) as p95_latency
FROM events
WHERE event_time >= NOW() - INTERVAL 30 DAY
GROUP BY hour
ORDER BY hour DESC;

-- Column-oriented compression: 100x better than row storage
-- Before compression (Postgres row-store): 100GB
-- After ClickHouse (column-oriented + LZ4): 1GB

-- Decision matrix:
-- Use PostgreSQL if: OLTP, complex joins, ACID transactions, <1M rows/sec insert
-- Use ClickHouse if: Analytics, time-series, immutable events, >1M rows/sec insert

-- Real-world schema sizes (1 billion event rows):
-- Postgres: 150GB uncompressed
-- ClickHouse: 10GB compressed (15x savings)

MergeTree Engine: The Foundation of ClickHouse Performance

MergeTree is ClickHouse's core table engine. All production tables use MergeTree variants.

-- Basic MergeTree: immutable data, no deletions
CREATE TABLE events (
  event_id UUID,
  user_id UInt64,
  event_type String,
  event_time DateTime,
  properties JSON,
  value Float32
)
ENGINE = MergeTree()
ORDER BY (event_type, user_id, event_time)
PRIMARY KEY (event_type)
PARTITION BY toYYYYMM(event_time)
SETTINGS index_granularity = 8192;

-- ORDER BY determines primary sort (used for compression and queries)
-- PRIMARY KEY is subset of ORDER BY (optional, speeds up WHERE clauses)
-- PARTITION BY splits data into separate parts (date-based is common)
-- index_granularity = rows per index mark (8192 = default, good for most cases)

-- Best practice: ORDER BY (tenant_id, timestamp)
-- Rationale: partition elimination by tenant + time-based compression

-- Monthly partitions (balance between part count and query speed)
CREATE TABLE metrics (
  metric_name String,
  timestamp DateTime,
  value Float64,
  tags Map(String, String)
)
ENGINE = MergeTree()
ORDER BY (metric_name, timestamp)
PARTITION BY toYYYYMM(timestamp)
TTL timestamp + INTERVAL 1 YEAR;

-- TTL: Time-To-Live (auto-delete old data)
-- TTL timestamp + INTERVAL 90 DAY: keep 90 days of data

-- ReplacingMergeTree: handles updates/deletes (expensive, use sparingly)
CREATE TABLE user_snapshots (
  user_id UInt64,
  version UInt64,
  snapshot_time DateTime,
  profile_data String
)
ENGINE = ReplacingMergeTree(version)
ORDER BY (user_id, snapshot_time)
PARTITION BY toYYYYMM(snapshot_time);

-- SummingMergeTree: aggregates on read (good for pre-aggregates)
CREATE TABLE hourly_stats (
  metric_name String,
  hour_time DateTime,
  event_count UInt64,
  total_value Float64
)
ENGINE = SummingMergeTree((event_count, total_value))
ORDER BY (metric_name, hour_time);

Materialized Views: Pre-Aggregation at Insert Time

Materialized views automatically aggregate data as it arrives, enabling instant queries.

-- Raw events table (high volume)
CREATE TABLE events_raw (
  event_id UUID,
  user_id UInt64,
  event_type String,
  event_time DateTime,
  value Float32
)
ENGINE = MergeTree()
ORDER BY (event_type, event_time)
PARTITION BY toYYYYMM(event_time);

-- Aggregated table (target for materialized view)
CREATE TABLE events_hourly (
  event_type String,
  event_hour DateTime,
  event_count UInt64,
  total_value SimpleAggregateFunction(sum, Float32),
  avg_value SimpleAggregateFunction(avg, Float32)
)
ENGINE = SummingMergeTree((event_count, total_value))
ORDER BY (event_type, event_hour)
PARTITION BY toYYYYMM(event_hour);

-- Materialized view: auto-aggregates on insert
CREATE MATERIALIZED VIEW events_hourly_mv
TO events_hourly
AS SELECT
  event_type,
  toStartOfHour(event_time) as event_hour,
  count() as event_count,
  sum(value) as total_value,
  avg(value) as avg_value
FROM events_raw
GROUP BY event_type, event_hour;

-- Insert raw events (automatically aggregated)
INSERT INTO events_raw VALUES
('uuid1', 123, 'click', NOW(), 1.0),
('uuid2', 124, 'view', NOW(), 2.0);

-- Query pre-aggregated data (instant)
SELECT * FROM events_hourly ORDER BY event_hour DESC LIMIT 100;

-- Multi-level aggregation (hourly → daily)
CREATE MATERIALIZED VIEW events_daily_mv
TO events_daily
AS SELECT
  event_type,
  toStartOfDay(event_hour) as event_day,
  sum(event_count) as event_count,
  sum(total_value) as total_value
FROM events_hourly
GROUP BY event_type, event_day;

ReplacingMergeTree for Deduplication

Real-world data often has duplicates from retries or replication. ReplacingMergeTree handles this.

-- Events with potential duplicates (idempotent key)
CREATE TABLE events_dedup (
  event_id UUID,
  user_id UInt64,
  event_type String,
  event_time DateTime,
  version UInt64,  -- version number for deduplication
  value Float32
)
ENGINE = ReplacingMergeTree(version)
ORDER BY (event_id, version)
PARTITION BY toYYYYMM(event_time);

-- Insert duplicate event (different version)
INSERT INTO events_dedup VALUES
(toUUID('12345678-1234-5678-1234-567812345678'), 100, 'purchase', NOW(), 1, 99.99),
(toUUID('12345678-1234-5678-1234-567812345678'), 100, 'purchase', NOW(), 2, 99.99);

-- Final modifier: deduplicates on read
SELECT *
FROM events_dedup
FINAL  -- ensures latest version returned
WHERE user_id = 100;

-- Non-FINAL query shows both versions (for debugging)
SELECT * FROM events_dedup WHERE user_id = 100;

-- Best practice: application assigns version = insert timestamp
-- Higher timestamp = latest version
ALTER TABLE events_dedup
  ADD COLUMN insert_ts UInt64;

-- Upsert pattern (update version on conflict)
INSERT INTO events_dedup VALUES
(toUUID('87654321-4321-8765-4321-876543218765'), 101, 'login', NOW(), 1000000000, 0.0);

Partitioning Strategy: Date-Based Partitioning

Partitions are key for managing retention and query performance.

-- Daily partitions (granular, good for retention policies)
CREATE TABLE logs_daily (
  timestamp DateTime,
  level String,
  message String,
  service String
)
ENGINE = MergeTree()
ORDER BY (timestamp, service)
PARTITION BY toDate(timestamp)  -- new partition each day
TTL timestamp + INTERVAL 30 DAY;  -- auto-delete old partitions

-- Monthly partitions (fewer partitions, less overhead)
CREATE TABLE billing_events (
  billing_period String,
  customer_id UInt64,
  amount Float64,
  timestamp DateTime
)
ENGINE = MergeTree()
ORDER BY (customer_id, timestamp)
PARTITION BY toYYYYMM(timestamp);

-- Query partition metadata
SELECT
  table,
  partition,
  partition_key,
  name,
  rows
FROM system.parts
WHERE table = 'events_raw'
ORDER BY partition DESC;

-- Drop old partition (instant, no scan)
ALTER TABLE events_raw DROP PARTITION '202512';

-- Move partition to another disk
ALTER TABLE events_raw MOVE PARTITION '202601' TO DISK 'archive';

-- Part count monitoring (too many parts = slow queries)
SELECT
  table,
  count() as part_count,
  sum(rows) as total_rows
FROM system.parts
WHERE active = 1
GROUP BY table
ORDER BY part_count DESC;

Dual-Write Pattern: Postgres + ClickHouse

Keep Postgres and ClickHouse in sync without a dedicated CDC tool.

// Node.js: dual-write to Postgres and ClickHouse
import pg from 'pg';
import { ClickHouse } from 'clickhouse';

const pgPool = new pg.Pool({
  host: 'postgres-host',
  database: 'myapp',
});

const clickhouse = new ClickHouse({
  host: 'clickhouse-host',
  port: 8123,
  database: 'analytics',
  basicAuth: { username: 'default', password: 'password' },
});

interface Event {
  eventId: string;
  userId: number;
  eventType: string;
  value: number;
  timestamp: Date;
}

async function logEvent(event: Event) {
  try {
    // Phase 1: Write to Postgres (primary)
    await pgPool.query(
      `INSERT INTO events (id, user_id, type, value, created_at)
       VALUES ($1, $2, $3, $4, $5)`,
      [event.eventId, event.userId, event.eventType, event.value, event.timestamp]
    );

    // Phase 2: Write to ClickHouse (async, can fail)
    try {
      await clickhouse.insert({
        table: 'events_raw',
        values: [event],
        format: 'JSONEachRow',
      }).catch(err => {
        // Log to dead letter queue, retry later
        console.error('ClickHouse insert failed:', err);
        // Queue for retry
      });
    } catch (chErr) {
      console.error('ClickHouse write failed, queued for retry:', chErr.message);
      // Store in retry table for async processor
      await pgPool.query(
        `INSERT INTO clickhouse_retry_queue (event_data, retry_count)
         VALUES ($1, 0)`,
        [JSON.stringify(event)]
      );
    }

    return { success: true, eventId: event.eventId };
  } catch (pgErr) {
    // Postgres failure is fatal
    console.error('Postgres write failed:', pgErr);
    throw pgErr;
  }
}

// Async retry processor (background job)
async function processCHRetries() {
  const retries = await pgPool.query(
    `SELECT * FROM clickhouse_retry_queue
     WHERE retry_count < 5 AND created_at > NOW() - INTERVAL 24 HOUR
     LIMIT 100`
  );

  for (const row of retries.rows) {
    try {
      const event = JSON.parse(row.event_data);
      await clickhouse.insert({
        table: 'events_raw',
        values: [event],
        format: 'JSONEachRow',
      });

      // Mark as complete
      await pgPool.query(
        `DELETE FROM clickhouse_retry_queue WHERE id = $1`,
        [row.id]
      );
    } catch (err) {
      // Increment retry count
      await pgPool.query(
        `UPDATE clickhouse_retry_queue SET retry_count = retry_count + 1
         WHERE id = $1`,
        [row.id]
      );
    }
  }
}

Query Optimization: PREWHERE vs WHERE

ClickHouse's PREWHERE clause filters before reading other columns, saving I/O.

-- SLOW: WHERE filters after reading all columns
SELECT *
FROM events
WHERE event_type = 'purchase'
LIMIT 100;

-- FAST: PREWHERE filters before reading other columns
SELECT *
FROM events
PREWHERE event_type = 'purchase'
LIMIT 100;

-- ClickHouse automatically optimizes simple WHERE conditions
-- But explicit PREWHERE is useful for complex queries

-- Benchmark: event_type is in ORDER BY, so filtering is fast
SELECT
  event_type,
  count() as cnt,
  avg(value) as avg_val
FROM events
PREWHERE event_type IN ('purchase', 'refund')  -- filter early
WHERE event_time > NOW() - INTERVAL 7 DAY
GROUP BY event_type;

-- Aggregation query optimization
SELECT
  user_id,
  count() as purchases,
  sum(value) as total_spend,
  quantile(0.95)(value) as p95_order_value
FROM events
PREWHERE event_type = 'purchase' AND event_time > NOW() - INTERVAL 90 DAY
GROUP BY user_id
HAVING purchases > 10
ORDER BY total_spend DESC
LIMIT 1000;

-- Array functions (common in log analytics)
SELECT
  level,
  count() as log_count,
  countIf(arrayExists(x -> x = 'error', tags)) as error_count
FROM logs
PREWHERE level IN ('ERROR', 'CRITICAL')
GROUP BY level;

Node.js ClickHouse Client Integration

Practical patterns for application integration.

import { ClickHouse } from 'clickhouse';

const ch = new ClickHouse({
  host: 'clickhouse.example.com',
  port: 8123,
  database: 'analytics',
  basicAuth: { username: 'user', password: 'pass' },
});

// Batch insert for throughput
async function batchInsertEvents(events: Event[]) {
  if (events.length === 0) return;

  await ch.insert({
    table: 'events_raw',
    values: events,
    format: 'JSONEachRow',
  });
}

// Query with aggregation
async function getUserMetrics(userId: number, days: number = 7) {
  const result = await ch.query(
    `SELECT
      user_id,
      count() as event_count,
      sum(value) as total_value,
      avg(value) as avg_value,
      quantile(0.95)(value) as p95_value,
      uniqExact(event_type) as event_types
    FROM events
    WHERE user_id = ${userId}
      AND event_time > now() - interval ${days} day
    GROUP BY user_id`
  ).toPromise();

  return result[0];
}

// Stream large result set
async function streamEventsByType(type: string) {
  const stream = ch.query(
    `SELECT * FROM events WHERE event_type = '${type}'`
  ).stream();

  return new Promise((resolve, reject) => {
    stream
      .on('data', (chunk) => {
        // Process chunk (100-1000 rows at a time)
        processEvents(JSON.parse(chunk));
      })
      .on('end', () => resolve(null))
      .on('error', reject);
  });
}

// Execute raw query with parameters (parameterized queries)
async function queryWithParams(userId: number, days: number) {
  const query = `
    SELECT event_id, value FROM events
    WHERE user_id = {user_id:UInt64}
      AND event_time > now() - interval {days:UInt32} day
  `;

  const result = await ch.query(query)
    .param('user_id', userId)
    .param('days', days)
    .toPromise();

  return result;
}

// Connection pool for high throughput
class CHConnection {
  private clients: ClickHouse[] = [];

  constructor(poolSize: number = 10) {
    for (let i = 0; i < poolSize; i++) {
      this.clients.push(new ClickHouse({
        host: 'clickhouse.example.com',
        database: 'analytics',
      }));
    }
  }

  async insert(table: string, values: any[]) {
    const client = this.clients[Math.floor(Math.random() * this.clients.length)];
    return client.insert({ table, values, format: 'JSONEachRow' });
  }

  async query(sql: string) {
    const client = this.clients[Math.floor(Math.random() * this.clients.length)];
    return client.query(sql).toPromise();
  }
}

Checklist

  • MergeTree engine chosen and ORDER BY optimized for query patterns
  • Partitioning strategy implemented (date-based partitions)
  • TTL configured for automatic old data deletion
  • Materialized views created for common aggregations
  • PREWHERE clauses used in analytical queries
  • Dual-write pattern implemented (Postgres primary, CH async)
  • ClickHouse retry queue for failed writes
  • Connection pooling configured for Node.js client
  • Query performance validated with sample 1M+ row datasets
  • Part count monitoring and merge settings configured
  • Backup strategy documented (partition export)
  • Replica count and replication lag monitored

Conclusion

ClickHouse transforms analytics from a separate slow system into a fast, queryable data store integrated with your primary database. Use MergeTree with proper ordering and partitioning, implement materialized views for pre-aggregation, and maintain dual-write consistency through a retry queue. With proper configuration, you'll handle billions of rows with sub-second query latency while keeping analytics in sync with your operational data.