- Published on
ClickHouse for Real-Time Analytics — Insert Millions of Rows Per Second
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
PostgreSQL is the workhorse of application databases. It handles OLTP (transactional) workloads well. But analytics is different. Aggregating billions of rows across a million columns is a different beast.
ClickHouse is a columnar database built for analytics. It inserts millions of rows per second, compresses data to 10x of PostgreSQL, and queries terabytes instantly. For real-time dashboards, event processing, and analytics, it''s transformative.
This post covers when to add ClickHouse to your stack, how to use it, and benchmarks.
- ClickHouse vs PostgreSQL for Analytics
- MergeTree Table Engine
- Materialized Views for Pre-Aggregation
- AggregatingMergeTree for Incremental Aggregations
- ReplacingMergeTree for Deduplication
- Columnar Storage Benefits
- Inserting from Node.js with @clickhouse/client
- Async Inserts for High Throughput
- TTL for Data Expiry
- ClickHouse Cloud vs Self-Hosted
- When to Add ClickHouse to Your Stack
- Typical Architecture
- Checklist
- Conclusion
ClickHouse vs PostgreSQL for Analytics
PostgreSQL uses row-oriented storage. Every column for a row is stored together. This is efficient for OLTP (updating one user''s profile) but terrible for analytics (summing revenue across 100 million rows).
ClickHouse uses column-oriented storage. All values for a column are stored together, compressed heavily. Querying revenue is fast: ClickHouse reads only the revenue column, not all columns.
Benchmark: aggregate 1 billion rows, 20 columns, sum 5 columns.
| Database | Query Time | Memory Used |
|---|---|---|
| PostgreSQL | 45 seconds | 2GB |
| ClickHouse | 150ms | 80MB |
ClickHouse is 300x faster. For real-time dashboards, it''s the difference between feasible and impossible.
PostgreSQL strengths: ACID, arbitrary queries, joins, updates, single-row operations.
ClickHouse strengths: Column analytics, bulk inserts, compression, parallelism.
Use both. OLTP transactions go to PostgreSQL. Analytics events go to ClickHouse.
MergeTree Table Engine
ClickHouse tables are organized by "engines." MergeTree is the workhorse.
CREATE TABLE events (
timestamp DateTime,
user_id UInt64,
event_type String,
event_properties JSON,
session_id String
) ENGINE = MergeTree()
ORDER BY (timestamp, user_id)
PARTITION BY toYYYYMM(timestamp);
Key concepts:
- ORDER BY: ClickHouse sorts data and builds indexes on these columns. Query filters on these columns are fast.
- PARTITION BY: Data is split by month, year, or custom granularity. Partitions can be dropped instantly. No delete/update slowness.
When you insert 100,000 rows, ClickHouse creates a "part." When parts accumulate, they merge in the background. This merging is asynchronous and doesn''t block writes.
Materialized Views for Pre-Aggregation
Materialized views in ClickHouse are pipelines. When data lands in one table, a materialized view automatically aggregates and stores the result in another table.
-- Raw events table
CREATE TABLE raw_events (
timestamp DateTime,
user_id UInt64,
revenue Decimal128(2)
) ENGINE = MergeTree()
ORDER BY timestamp;
-- Aggregated by hour
CREATE TABLE hourly_revenue (
hour DateTime,
total_revenue AggregateFunction(sum, Decimal128(2)),
user_count AggregateFunction(uniq, UInt64)
) ENGINE = AggregatingMergeTree()
ORDER BY hour;
-- Materialized view: pipe raw_events into hourly aggregation
CREATE MATERIALIZED VIEW hourly_revenue_mv
TO hourly_revenue
AS SELECT
toStartOfHour(timestamp) as hour,
sumState(revenue) as total_revenue,
uniqState(user_id) as user_count
FROM raw_events
GROUP BY hour;
Now, when events are inserted into raw_events, the view automatically updates hourly_revenue. Querying hourly_revenue is instant—no aggregation needed.
This pattern scales. Pre-aggregate at insertion time, query instantly.
AggregatingMergeTree for Incremental Aggregations
Standard aggregations require processing all data. Incremental aggregations process partial results and merge them.
AggregatingMergeTree stores state, not final values:
CREATE TABLE metrics (
timestamp DateTime,
metric_name String,
value AggregateFunction(sum, Float64)
) ENGINE = AggregatingMergeTree()
ORDER BY (timestamp, metric_name);
-- Insert partial sums from different servers
INSERT INTO metrics VALUES
('2026-03-18 10:00:00', 'requests', sumState(toFloat64(1500))),
('2026-03-18 10:00:00', 'requests', sumState(toFloat64(2300)));
-- Query merges partial sums
SELECT
timestamp,
metric_name,
sumMerge(value) as total
FROM metrics
GROUP BY timestamp, metric_name;
Each server sends partial aggregates. ClickHouse merges them instantly. This is how you handle high-cardinality metrics at scale.
ReplacingMergeTree for Deduplication
Distributed systems have duplicates. The same event might be written twice. ReplacingMergeTree handles this.
CREATE TABLE user_state (
user_id UInt64,
state String,
updated_at DateTime,
version UInt64
) ENGINE = ReplacingMergeTree(version)
ORDER BY user_id;
-- Insert or update
INSERT INTO user_state VALUES
(123, 'active', now(), 1),
(123, 'inactive', now(), 2); -- Replaces version 1
-- Query latest version
SELECT * FROM user_state
FINAL -- Only latest version per user_id
WHERE user_id = 123;
The FINAL keyword returns the latest version of each row. Useful for dimension tables that change (user names, settings).
Columnar Storage Benefits
Columnar storage enables aggressive compression. ClickHouse achieves 10–50x compression on typical event data.
Example: a table with 1 billion rows, 30 columns:
- PostgreSQL: ~200GB raw size
- ClickHouse: ~4GB compressed size
Benefits:
- Cost: Less disk, less replication traffic.
- Speed: Smaller data = faster network transfer = faster queries.
- Cache: More data fits in CPU L3 cache, improving query speed.
Inserting from Node.js with @clickhouse/client
The official Node.js client is lightweight and fast.
import { createClient } from '@clickhouse/client';
const client = createClient({
host: 'https://clickhouse.example.com',
username: 'default',
password: 'password',
database: 'default',
});
// Insert multiple rows
await client.insert({
table: 'events',
values: [
{ timestamp: new Date(), user_id: 1, event_type: 'click', revenue: 10.5 },
{ timestamp: new Date(), user_id: 2, event_type: 'purchase', revenue: 50.0 },
],
format: 'JSONEachRow',
});
// Query
const rows = await client.query({
query: 'SELECT * FROM events LIMIT 10',
format: 'JSONEachRow',
});
for await (const row of rows.data()) {
console.log(row);
}
The client handles connection pooling, retries, and pipelining automatically.
Async Inserts for High Throughput
By default, inserts are synchronous—ClickHouse acknowledges each insert. For high volume, use async inserts:
await client.insert({
table: 'events',
values: events,
format: 'JSONEachRow',
settings: {
async_insert: 1,
wait_for_async_insert: 0, // Don''t wait for ClickHouse
},
});
The server acknowledges immediately, then inserts asynchronously. You can fire-and-forget millions of events per second.
Tradeoff: if the server crashes before async inserts are flushed, you lose data. For analytics (not critical), acceptable.
TTL for Data Expiry
Analytics data accumulates. TTL (time-to-live) automatically deletes old data.
CREATE TABLE events (
timestamp DateTime,
event_type String
) ENGINE = MergeTree()
ORDER BY timestamp
TTL timestamp + INTERVAL 90 DAY; -- Delete after 90 days
Deletion happens during background merges. No blocking, no delete queries.
This saves storage costs. Old data is automatically pruned.
ClickHouse Cloud vs Self-Hosted
ClickHouse Cloud: Managed service. Automatic backups, scaling, updates. Costs ~$0.5/GB/month.
Self-hosted: You manage replication, backups, hardware. Cheaper at scale (>10TB), but operational burden.
For most startups, ClickHouse Cloud is worth it. You avoid operations.
When to Add ClickHouse to Your Stack
Add ClickHouse if:
- You log >1M events/hour
- You need real-time analytics (not batch)
- Queries on PostgreSQL take >10 seconds
- You have time-series data (metrics, events)
- You need <1 second query latency
Don''t add ClickHouse if:
- You process <100k events/hour
- Analytics is batch (runs daily)
- PostgreSQL queries are fast enough
- You want to minimize infrastructure
ClickHouse adds operational complexity. Justify it with volume and requirements.
Typical Architecture
Events from app
↓
Kafka (or HTTP)
↓
ClickHouse (events table)
↓
Materialized views
↓
Aggregated tables (hourly, daily)
↓
Dashboard queries (instant response)
Events flow in real-time. Materialized views pre-aggregate. Dashboards query aggregates, not raw events. This scales to billions of events.
Checklist
- Create a ClickHouse Cloud account or self-host
- Design schema with ORDER BY and PARTITION BY
- Create materialized views for aggregations
- Install
@clickhouse/clientin your Node.js app - Send sample events and verify inserts
- Benchmark query latency vs PostgreSQL
- Set up TTL for data expiry
- Configure backups and replication
- Monitor disk usage and compression ratio
- Plan migration from PostgreSQL analytics to ClickHouse
Conclusion
PostgreSQL is universal but not optimal for everything. For analytics, ClickHouse is unmatched. Millions of rows per second, instant queries, aggressive compression.
If you''re building a data-heavy product—SaaS dashboards, event analytics, real-time monitoring—ClickHouse belongs in your stack. It''s the difference between waiting 30 seconds for a dashboard to load and sub-second response times.
Start with Postgres. When analytics becomes a bottleneck, add ClickHouse. You''ll be amazed at the performance difference.