Change Data Capture (CDC)

Turning a static database into a real-time stream of events.

The idea

How do you keep a Search index (like Elasticsearch) perfectly in sync with your main Database (like PostgreSQL)? You could update both from your application code, but if one fails, they go out of sync. Change Data Capture (CDC) solves this by secretly listening to the database's internal transaction log. Every time a row is inserted, updated, or deleted, the CDC tool immediately emits an event to a message queue for other systems to consume.

Step 1: The App inserts a User into the Database.

How it works (The WAL)

Databases like Postgres use a Write-Ahead Log (WAL) to guarantee durability. Tools like Debezium connect to the database acting as a "replica", reading this WAL stream, translating binary log entries into JSON, and publishing them to Kafka.

# The conceptual flow of CDC (e.g., Debezium -> Kafka)

# 1. Application executes SQL
db.execute("UPDATE users SET status = 'active' WHERE id = 5")

# 2. Database writes to its internal WAL (Write-Ahead Log)
# WAL Entry: [LSN 1004] UPDATE table=users id=5 old_status=pending new_status=active

# 3. CDC Tool (Debezium) reads the WAL and pushes to Kafka
kafka.publish("db.users.changes", {
    "op": "u", # Update
    "before": {"id": 5, "status": "pending"},
    "after": {"id": 5, "status": "active"}
})

# 4. Elasticsearch consumer reads Kafka and updates the index
search_index.update_document(5, {"status": "active"})

Cost

CDC decouples systems beautifully, but adds infrastructural complexity. You now have to manage Kafka, Debezium, and monitor replication lag. If Kafka goes down, the database might pause writes to prevent the WAL from filling up the disk!

Watch out for

Schema Evolution: If you rename a column in the database, the CDC JSON schema will instantly change. Downstream consumers (like Elasticsearch or Data Warehouses) will break if they aren't prepared for the schema change.
Event Ordering: It's critical that the CDC stream guarantees ordering by Primary Key. If an `INSERT` and a subsequent `DELETE` for the same row arrive out of order at the search index, you will have a ghost record forever.