- Published on
Change Data Capture With Debezium — Sync Your Database to Anywhere in Real Time
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
Your source of truth is PostgreSQL. Your search index is Elasticsearch. Your vector DB is Pinecone. Keeping them synchronized is a nightmare: polling is slow and wasteful, application-side change tracking is error-prone.
Change Data Capture (CDC) solves this. Debezium reads your database's transaction log in real time, extracts changes, and streams them to Kafka. Subscribe to Kafka and sync anywhere. No polling, no triggers, no consistency headaches.
- What CDC is and Why It Beats Polling
- Debezium PostgreSQL Connector Setup
- WAL-Based Capture (No Polling, No Triggers)
- Filtering Events by Table and Operation
- Schema Evolution Handling
- Debezium + Kafka Connect
- Routing Events to Redis, Elasticsearch, Vector DB
- Outbox Pattern With Debezium
- Debezium Server for Lightweight CDC Without Kafka
- Monitoring Lag and Offset
- Checklist
- Conclusion
What CDC is and Why It Beats Polling
Polling: Every 5 minutes, query for rows with updated_at > last_sync. Slow, expensive, races with incoming data.
CDC: Listen to the database's transaction log. When data changes, you know instantly. Stream the change to subscribers.
Debezium implements CDC. It reads PostgreSQL's Write-Ahead Log (WAL), MySQL's binlog, or MongoDB's oplog. These logs are already written for durability; Debezium just reads them.
No polling. No triggers. No application code. Just database logs flowing to Kafka.
Debezium PostgreSQL Connector Setup
Install Kafka Connect and the Debezium PostgreSQL connector. For local development, use Docker Compose:
version: '3'
services:
postgres:
image: postgres:15
environment:
POSTGRES_DB: mydb
POSTGRES_USER: postgres
POSTGRES_PASSWORD: secret
command:
- '-c'
- 'wal_level=logical'
- '-c'
- 'max_wal_senders=4'
- '-c'
- 'max_replication_slots=4'
ports:
- '5432:5432'
kafka:
image: confluentinc/cp-kafka:7.5.0
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:29092,PLAINTEXT_HOST://localhost:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
kafka-connect:
image: confluentinc/cp-kafka-connect:7.5.0
environment:
CONNECT_BOOTSTRAP_SERVERS: kafka:29092
CONNECT_REST_ADVERTISED_HOST_NAME: kafka-connect
CONNECT_GROUP_ID: compose-group
CONNECT_CONFIG_STORAGE_TOPIC: docker-connect-configs
CONNECT_OFFSET_STORAGE_TOPIC: docker-connect-offsets
CONNECT_STATUS_STORAGE_TOPIC: docker-connect-status
depends_on:
- kafka
ports:
- '8083:8083'
zookeeper:
image: confluentinc/cp-zookeeper:7.5.0
environment:
ZOOKEEPER_CLIENT_PORT: 2181
Create the PostgreSQL replication user:
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'secret';
GRANT CONNECT ON DATABASE mydb TO replicator;
Now configure the Debezium connector via HTTP:
curl -X POST http://localhost:8083/connectors \
-H "Content-Type: application/json" \
-d '{
"name": "postgres-cdc",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "postgres",
"database.port": 5432,
"database.user": "replicator",
"database.password": "secret",
"database.dbname": "mydb",
"database.server.name": "myserver",
"plugin.name": "pgoutput",
"table.include.list": "public.users,public.products"
}
}'
Debezium now streams changes from users and products tables to Kafka topics myserver.public.users and myserver.public.products.
WAL-Based Capture (No Polling, No Triggers)
PostgreSQL writes all changes to the Write-Ahead Log (WAL) before committing to disk. Debezium reads the WAL via logical replication.
No triggers: Debezium doesn't add triggers. Your application doesn't change.
No polling: Debezium streams changes as they happen, not on a schedule.
No overhead: WAL reading is a standard PostgreSQL feature. Your database doesn't slow down.
Filtering Events by Table and Operation
You don't need all changes. Filter by table and operation type (insert, update, delete):
{
"name": "postgres-cdc-filtered",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "postgres",
"database.port": 5432,
"database.user": "replicator",
"database.password": "secret",
"database.dbname": "mydb",
"database.server.name": "myserver",
"table.include.list": "public.products",
"column.include.list": "public.products.id,public.products.name,public.products.price",
"skipped.operations": "u,d"
}
}
Now only insert operations on the products table flow to Kafka.
Schema Evolution Handling
Your schema changes over time. Debezium includes schema information in Kafka messages (via Avro or JSON Schema). Consumers can adapt:
{
"schema": {
"type": "record",
"fields": [
{ "name": "id", "type": "int" },
{ "name": "name", "type": "string" },
{ "name": "email", "type": ["null", "string"], "default": null }
]
},
"payload": {
"id": 1,
"name": "Alice",
"email": "alice@example.com"
}
}
When you add a column to PostgreSQL, the schema updates in Kafka. Consumers that don't understand the new field ignore it; consumers that do, process it.
Debezium + Kafka Connect
Kafka Connect is a framework for connectors: sources (read from DBs) and sinks (write to systems).
Debezium provides the PostgreSQL source connector. Third-party sinks exist for Elasticsearch, S3, JDBC, and more. Chain them:
Database (Postgres) → Debezium Source → Kafka → Elasticsearch Sink → Elasticsearch
Setup:
curl -X POST http://localhost:8083/connectors \
-H "Content-Type: application/json" \
-d '{
"name": "elasticsearch-sink",
"config": {
"connector.class": "com.example.ElasticsearchSink",
"topics": "myserver.public.products",
"es.host": "elasticsearch",
"es.port": 9200,
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter"
}
}'
Now every change to your products table appears in Elasticsearch instantly.
Routing Events to Redis, Elasticsearch, Vector DB
Use Kafka's routing flexibility. Emit changes to a Kafka topic, then subscribe from any system:
Redis cache: Invalidate cache on product update:
import { Kafka } from 'kafkajs';
import redis from 'redis';
const kafka = new Kafka({
clientId: 'cache-invalidator',
brokers: [process.env.KAFKA_BROKER]
});
const consumer = kafka.consumer({ groupId: 'cache-group' });
const redisClient = redis.createClient();
await consumer.subscribe({ topic: 'myserver.public.products' });
await consumer.run({
eachMessage: async ({ message }) => {
const change = JSON.parse(message.value?.toString() || '{}');
const productId = change.payload.after?.id;
await redisClient.del(`product:${productId}`);
}
});
Elasticsearch: Debezium Elasticsearch sink connector handles this automatically.
Vector DB: A consumer process reads changes and computes embeddings:
const change = JSON.parse(message.value?.toString() || '{}');
const product = change.payload.after;
if (product) {
const embedding = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: `${product.name} ${product.description}`
});
await pinecone.upsert({
vectors: [
{
id: String(product.id),
values: embedding.data[0].embedding,
metadata: { name: product.name }
}
]
});
}
One database change, multiple systems synchronized.
Outbox Pattern With Debezium
Traditional outbox pattern: application writes to both orders table and outbox table. A polling process reads outbox and publishes to Kafka. Slow.
Debezium outbox: Write to orders. Debezium captures it. Application writes to outbox in the same transaction for reliability. Debezium streams both changes; a transformer extracts outbox events and publishes to Kafka topics.
This replaces the polling outbox processor entirely. Changes flow to Kafka as soon as they commit to PostgreSQL.
Debezium Server for Lightweight CDC Without Kafka
Not everyone runs Kafka. Debezium Server is a standalone service that reads CDC logs and pushes to HTTP, S3, Pulsar, or RabbitMQ without requiring Kafka:
debezium:
server:
engine:
name: postgres
connector:
class: io.debezium.connector.postgresql.PostgresConnector
database.hostname: localhost
database.user: replicator
database.password: secret
database.dbname: mydb
transforms:
- type: ExtractNewRecordState
sink:
type: http
http.url: http://localhost:3000/webhooks/cdc
Debezium Server POSTs each change to your webhook. Process it in your application. No Kafka, no extra infrastructure.
Monitoring Lag and Offset
Track how far behind CDC is:
kafka-consumer-groups --bootstrap-server localhost:9092 \
--group debezium --describe
Output shows lag per topic partition. High lag means Debezium can't keep up. Scale Kafka partitions or optimize the connector.
Monitor in application:
const admin = kafka.admin();
const groupOffsets = await admin.fetchOffsets('debezium');
const lag = (highWatermark) - (currentOffset);
console.log(`Lag: ${lag} messages`);
Set alerts when lag exceeds threshold (e.g., >10k messages).
Checklist
- Enable logical replication in PostgreSQL
- Create replication user with appropriate permissions
- Deploy Kafka and Kafka Connect
- Configure Debezium PostgreSQL connector
- Test that changes appear in Kafka topics
- Set up sink connectors for target systems
- Implement custom consumers for complex transformations
- Monitor CDC lag and alerting
- Document your CDC topology and schema mappings
- Test failover scenarios (connector crash, DB restart)
Conclusion
Debezium eliminates the polling and consistency headaches of syncing data across systems. Read from your database's transaction log, stream changes to Kafka, and subscribe from anywhere.
For PostgreSQL shops, Debezium is table stakes for real-time data architecture. Start small: sync one table to Redis or Elasticsearch. Expand to the full platform as confidence grows.
Your data will finally stay in sync.