ClickHouse for Real-Time Analytics — Insert Millions of Rows Per Second

Introduction

PostgreSQL is the workhorse of application databases. It handles OLTP (transactional) workloads well. But analytics is different. Aggregating billions of rows across a million columns is a different beast.

ClickHouse is a columnar database built for analytics. It inserts millions of rows per second, compresses data to 10x of PostgreSQL, and queries terabytes instantly. For real-time dashboards, event processing, and analytics, it''s transformative.

This post covers when to add ClickHouse to your stack, how to use it, and benchmarks.

ClickHouse vs PostgreSQL for Analytics
MergeTree Table Engine
Materialized Views for Pre-Aggregation
AggregatingMergeTree for Incremental Aggregations
ReplacingMergeTree for Deduplication
Columnar Storage Benefits
Inserting from Node.js with @clickhouse/client
Async Inserts for High Throughput
TTL for Data Expiry
ClickHouse Cloud vs Self-Hosted
When to Add ClickHouse to Your Stack
Typical Architecture
Checklist
Conclusion

ClickHouse vs PostgreSQL for Analytics

PostgreSQL uses row-oriented storage. Every column for a row is stored together. This is efficient for OLTP (updating one user''s profile) but terrible for analytics (summing revenue across 100 million rows).

ClickHouse uses column-oriented storage. All values for a column are stored together, compressed heavily. Querying revenue is fast: ClickHouse reads only the revenue column, not all columns.

Benchmark: aggregate 1 billion rows, 20 columns, sum 5 columns.

Database	Query Time	Memory Used
PostgreSQL	45 seconds	2GB
ClickHouse	150ms	80MB

ClickHouse is 300x faster. For real-time dashboards, it''s the difference between feasible and impossible.

PostgreSQL strengths: ACID, arbitrary queries, joins, updates, single-row operations.

ClickHouse strengths: Column analytics, bulk inserts, compression, parallelism.

Use both. OLTP transactions go to PostgreSQL. Analytics events go to ClickHouse.

MergeTree Table Engine

ClickHouse tables are organized by "engines." MergeTree is the workhorse.

CREATE TABLE events (
  timestamp DateTime,
  user_id UInt64,
  event_type String,
  event_properties JSON,
  session_id String
) ENGINE = MergeTree()
ORDER BY (timestamp, user_id)
PARTITION BY toYYYYMM(timestamp);

Key concepts:

ORDER BY: ClickHouse sorts data and builds indexes on these columns. Query filters on these columns are fast.
PARTITION BY: Data is split by month, year, or custom granularity. Partitions can be dropped instantly. No delete/update slowness.

When you insert 100,000 rows, ClickHouse creates a "part." When parts accumulate, they merge in the background. This merging is asynchronous and doesn''t block writes.

Materialized Views for Pre-Aggregation

Materialized views in ClickHouse are pipelines. When data lands in one table, a materialized view automatically aggregates and stores the result in another table.

-- Raw events table
CREATE TABLE raw_events (
  timestamp DateTime,
  user_id UInt64,
  revenue Decimal128(2)
) ENGINE = MergeTree()
ORDER BY timestamp;

-- Aggregated by hour
CREATE TABLE hourly_revenue (
  hour DateTime,
  total_revenue AggregateFunction(sum, Decimal128(2)),
  user_count AggregateFunction(uniq, UInt64)
) ENGINE = AggregatingMergeTree()
ORDER BY hour;

-- Materialized view: pipe raw_events into hourly aggregation
CREATE MATERIALIZED VIEW hourly_revenue_mv
TO hourly_revenue
AS SELECT
  toStartOfHour(timestamp) as hour,
  sumState(revenue) as total_revenue,
  uniqState(user_id) as user_count
FROM raw_events
GROUP BY hour;

Now, when events are inserted into raw_events, the view automatically updates hourly_revenue. Querying hourly_revenue is instant—no aggregation needed.

This pattern scales. Pre-aggregate at insertion time, query instantly.

AggregatingMergeTree for Incremental Aggregations

Standard aggregations require processing all data. Incremental aggregations process partial results and merge them.

AggregatingMergeTree stores state, not final values:

CREATE TABLE metrics (
  timestamp DateTime,
  metric_name String,
  value AggregateFunction(sum, Float64)
) ENGINE = AggregatingMergeTree()
ORDER BY (timestamp, metric_name);

-- Insert partial sums from different servers
INSERT INTO metrics VALUES
  ('2026-03-18 10:00:00', 'requests', sumState(toFloat64(1500))),
  ('2026-03-18 10:00:00', 'requests', sumState(toFloat64(2300)));

-- Query merges partial sums
SELECT
  timestamp,
  metric_name,
  sumMerge(value) as total
FROM metrics
GROUP BY timestamp, metric_name;

Each server sends partial aggregates. ClickHouse merges them instantly. This is how you handle high-cardinality metrics at scale.

ReplacingMergeTree for Deduplication

Distributed systems have duplicates. The same event might be written twice. ReplacingMergeTree handles this.

CREATE TABLE user_state (
  user_id UInt64,
  state String,
  updated_at DateTime,
  version UInt64
) ENGINE = ReplacingMergeTree(version)
ORDER BY user_id;

-- Insert or update
INSERT INTO user_state VALUES
  (123, 'active', now(), 1),
  (123, 'inactive', now(), 2);  -- Replaces version 1

-- Query latest version
SELECT * FROM user_state
FINAL  -- Only latest version per user_id
WHERE user_id = 123;

The FINAL keyword returns the latest version of each row. Useful for dimension tables that change (user names, settings).

Columnar Storage Benefits

Columnar storage enables aggressive compression. ClickHouse achieves 10–50x compression on typical event data.

Example: a table with 1 billion rows, 30 columns:

PostgreSQL: ~200GB raw size
ClickHouse: ~4GB compressed size

Benefits:

Cost: Less disk, less replication traffic.
Speed: Smaller data = faster network transfer = faster queries.
Cache: More data fits in CPU L3 cache, improving query speed.

Inserting from Node.js with `@clickhouse/client`

The official Node.js client is lightweight and fast.

import { createClient } from '@clickhouse/client';

const client = createClient({
  host: 'https://clickhouse.example.com',
  username: 'default',
  password: 'password',
  database: 'default',
});

// Insert multiple rows
await client.insert({
  table: 'events',
  values: [
    { timestamp: new Date(), user_id: 1, event_type: 'click', revenue: 10.5 },
    { timestamp: new Date(), user_id: 2, event_type: 'purchase', revenue: 50.0 },
  ],
  format: 'JSONEachRow',
});

// Query
const rows = await client.query({
  query: 'SELECT * FROM events LIMIT 10',
  format: 'JSONEachRow',
});

for await (const row of rows.data()) {
  console.log(row);
}

The client handles connection pooling, retries, and pipelining automatically.

Async Inserts for High Throughput

By default, inserts are synchronous—ClickHouse acknowledges each insert. For high volume, use async inserts:

await client.insert({
  table: 'events',
  values: events,
  format: 'JSONEachRow',
  settings: {
    async_insert: 1,
    wait_for_async_insert: 0,  // Don''t wait for ClickHouse
  },
});

The server acknowledges immediately, then inserts asynchronously. You can fire-and-forget millions of events per second.

Tradeoff: if the server crashes before async inserts are flushed, you lose data. For analytics (not critical), acceptable.

TTL for Data Expiry

Analytics data accumulates. TTL (time-to-live) automatically deletes old data.

CREATE TABLE events (
  timestamp DateTime,
  event_type String
) ENGINE = MergeTree()
ORDER BY timestamp
TTL timestamp + INTERVAL 90 DAY;  -- Delete after 90 days

Deletion happens during background merges. No blocking, no delete queries.

This saves storage costs. Old data is automatically pruned.

ClickHouse Cloud vs Self-Hosted

ClickHouse Cloud: Managed service. Automatic backups, scaling, updates. Costs ~$0.5/GB/month.

Self-hosted: You manage replication, backups, hardware. Cheaper at scale (>10TB), but operational burden.

For most startups, ClickHouse Cloud is worth it. You avoid operations.

When to Add ClickHouse to Your Stack

Add ClickHouse if:

You log >1M events/hour
You need real-time analytics (not batch)
Queries on PostgreSQL take >10 seconds
You have time-series data (metrics, events)
You need <1 second query latency

Don''t add ClickHouse if:

You process <100k events/hour
Analytics is batch (runs daily)
PostgreSQL queries are fast enough
You want to minimize infrastructure

ClickHouse adds operational complexity. Justify it with volume and requirements.

Typical Architecture

Events from app
    ↓
Kafka (or HTTP)
    ↓
ClickHouse (events table)
    ↓
Materialized views
    ↓
Aggregated tables (hourly, daily)
    ↓
Dashboard queries (instant response)

Events flow in real-time. Materialized views pre-aggregate. Dashboards query aggregates, not raw events. This scales to billions of events.

Checklist

Conclusion

PostgreSQL is universal but not optimal for everything. For analytics, ClickHouse is unmatched. Millions of rows per second, instant queries, aggressive compression.

If you''re building a data-heavy product—SaaS dashboards, event analytics, real-time monitoring—ClickHouse belongs in your stack. It''s the difference between waiting 30 seconds for a dashboard to load and sub-second response times.

Start with Postgres. When analytics becomes a bottleneck, add ClickHouse. You''ll be amazed at the performance difference.