Menu
Blog Documentation Community Pricing Demo Call Sign Up
Sign Up

Top CDC Tools for PostgreSQL

Compare the best CDC tools for PostgreSQL — Debezium, Airbyte, Fivetran, AWS DMS, and more for real-time change data capture.

Postgres

This post was written by an engineer at QueryPlane. QueryPlane is an app builder for your database: bring your own postgres db and you can create interactive applications to share with other developers, coworkers or even your customers. If you’re interested in trying it out, get started here.


Change Data Capture (CDC) streams database changes to other systems in real time. Instead of batch ETL jobs that run hourly or daily, CDC captures every insert, update, and delete as it happens. This post covers the top CDC tools for PostgreSQL, from open-source options to managed services.

In this post, we’ll cover:

  • Debezium - The open-source standard for CDC
  • Airbyte - ELT platform with CDC support
  • Fivetran - Enterprise data integration
  • Hightouch - Reverse ETL focused
  • Estuary Flow - Real-time data pipelines
  • Google Cloud Datastream - GCP-native CDC
  • AWS DMS - AWS database migration and replication

How PostgreSQL CDC works

PostgreSQL CDC relies on logical replication, introduced in PostgreSQL 10. The database writes changes to a write-ahead log (WAL), and CDC tools decode this log to capture row-level changes.

To enable CDC, you need to configure PostgreSQL with:

-- In postgresql.conf
wal_level = logical
max_replication_slots = 4  -- or higher
max_wal_senders = 4        -- or higher

Most CDC tools create a logical replication slot that holds a position in the WAL. The tool reads changes from this slot, processes them, and sends them to the destination. If the tool falls behind, the WAL retains data until the slot catches up—which can cause disk usage to grow, so monitoring is important.

Debezium

Debezium is the most widely used open-source CDC platform. It runs on Kafka Connect and supports PostgreSQL, MySQL, MongoDB, SQL Server, Oracle, and more.

The PostgreSQL connector uses the pgoutput logical decoding plugin (built into PostgreSQL 10+) or the older wal2json plugin. It captures all row-level changes and produces structured JSON messages containing the before and after states of each row.

A typical Debezium deployment involves running a Kafka cluster, deploying Kafka Connect workers, and configuring the Debezium PostgreSQL connector. Changes flow from PostgreSQL to Kafka topics, where consumers can process them for various use cases: syncing to data warehouses, populating search indexes, triggering downstream services.

Debezium handles schema changes automatically. When you alter a table, it detects the change and updates the message schema. It also tracks the source position for exactly-once semantics and can resume from the last committed offset after failures.

{
  "before": null,
  "after": {
    "id": 1,
    "email": "user@example.com",
    "created_at": 1699574400000
  },
  "source": {
    "connector": "postgresql",
    "db": "mydb",
    "table": "users"
  },
  "op": "c",
  "ts_ms": 1699574400123
}

The tradeoff is operational complexity. You need to run and maintain Kafka, handle connector configuration, and monitor replication lag. For teams without Kafka expertise, the managed alternatives below may be more practical.

Airbyte

Airbyte is an open-source data integration platform that supports CDC for PostgreSQL through its Postgres source connector. Unlike Debezium, Airbyte is a complete ELT platform rather than just a CDC engine.

Airbyte’s CDC implementation uses Debezium under the hood but wraps it in a more accessible interface. You configure the PostgreSQL source through a web UI, select which tables to sync, and choose a destination. Airbyte handles the Debezium configuration, Kafka (or a simpler internal queue), and the destination loading.

The platform supports hundreds of destinations including data warehouses (Snowflake, BigQuery, Redshift), databases (PostgreSQL, MySQL), and SaaS tools. This breadth makes Airbyte practical for teams that need to move data to multiple destinations without building separate pipelines.

CDC replication in Airbyte can run on a schedule (e.g., every 5 minutes) or continuously. The incremental sync mode captures only changes since the last sync, reducing load on the source database.

Airbyte offers both a self-hosted open-source version and Airbyte Cloud. The cloud version removes the operational burden of running the infrastructure but costs more than self-hosting.

Fivetran

Fivetran is an enterprise data integration service with strong PostgreSQL CDC support. It’s a fully managed SaaS platform—you don’t run any infrastructure.

The PostgreSQL connector uses logical replication to capture changes with low latency (typically under 5 minutes). Fivetran handles the complexity of managing replication slots, schema evolution, and incremental loading. You configure a source, select tables, and pick a destination warehouse.

Fivetran emphasizes reliability and data integrity. It includes automatic schema migration in the destination warehouse when source schemas change. The platform tracks sync status, alerts on failures, and provides observability into data freshness.

Destination support is focused on data warehouses and lakes: Snowflake, BigQuery, Redshift, Databricks, and others. For reverse sync (warehouse to operational database), you’d pair Fivetran with a tool like Hightouch.

Pricing is based on monthly active rows (rows that had changes synced). This model works well for relatively static data but can get expensive for high-churn tables.

Hightouch

Hightouch focuses on reverse ETL—moving data from warehouses back to operational systems. While not a traditional CDC tool, it complements CDC pipelines by enabling the reverse direction.

The typical pattern: use CDC (via Debezium, Fivetran, etc.) to replicate PostgreSQL data to a warehouse, transform and enrich it there, then use Hightouch to sync the enriched data to CRM, marketing tools, or other operational systems.

Hightouch connects to your warehouse as a source and syncs data to 200+ destinations including Salesforce, HubSpot, Braze, Intercom, and more. It supports incremental syncs based on changed data in the warehouse.

For PostgreSQL specifically, Hightouch can also read directly from PostgreSQL as a source (without CDC), though the warehouse pattern is more common for complex transformations.

See what QueryPlane can build for you

Connect to your database, write SQL with AI, and build shareable apps — all from your browser.

Estuary Flow

Estuary Flow is a real-time data platform built around CDC. It uses an open-source runtime that captures changes and delivers them to destinations with sub-second latency.

The PostgreSQL capture uses logical replication and maintains exactly-once semantics through checkpointing. Flow handles backpressure automatically—if a destination slows down, it buffers changes without losing data.

A unique feature is Flow’s data transformation capabilities. You can write derivations in SQL or TypeScript that transform change events in-flight before they reach the destination. This enables real-time aggregations, joins across tables, and custom business logic without separate processing infrastructure.

Flow supports materialization to various destinations including other databases, data warehouses, and event streaming systems. The platform is available as a managed cloud service or can be self-hosted.

Pricing is based on data volume (GB/month) rather than row count, which can be more predictable for high-volume workloads.

Google Cloud Datastream

Cloud Datastream is Google’s managed CDC service. It captures changes from PostgreSQL (and MySQL, Oracle) and delivers them to BigQuery, Cloud Storage, or other GCP services.

Datastream uses logical replication and handles the infrastructure automatically. You configure a source connection, select tables, and pick a destination. Google manages the replication slots, scaling, and fault tolerance.

The BigQuery integration is particularly smooth. Datastream can write changes directly to BigQuery tables with automatic schema updates. You can then query near-real-time data in BigQuery without managing a separate ingestion pipeline.

For PostgreSQL on Cloud SQL, setup is straightforward—the services integrate natively. For self-hosted or other cloud PostgreSQL, you need to ensure network connectivity and proper replication configuration.

Pricing is based on data processed (GB) with no per-row charges. This model favors high-volume, frequent-change workloads over sparse updates to large rows.

AWS Database Migration Service (DMS)

AWS DMS handles database migration and ongoing replication, including CDC for PostgreSQL. While originally designed for migrations, many teams use it for continuous replication between databases or to data lakes.

DMS supports PostgreSQL as both a source and target. For CDC, it uses logical replication to capture changes and applies them to the target with configurable latency. It handles initial full load plus ongoing CDC, managing the transition automatically.

Supported targets include other RDS databases, Redshift, S3, Kinesis, and Kafka. The S3 and Kinesis targets are commonly used for building data lakes or feeding real-time analytics.

DMS runs on replication instances that you provision. These instances handle the extraction, transformation, and loading of data. You choose instance sizes based on workload—higher throughput needs larger instances.

The service includes the Schema Conversion Tool (SCT) for heterogeneous migrations, though for CDC between PostgreSQL databases this isn’t typically needed.

Pricing is based on replication instance hours plus storage. A dms.t3.medium instance runs about $70/month. Data transfer costs apply when moving data across regions or out of AWS.

Comparison

ToolTypeLatencyBest for
DebeziumOpen-sourceReal-timeKafka-based architectures
AirbyteOpen-source/CloudMinutesMulti-destination ELT
FivetranManaged SaaSMinutesEnterprise warehouse loading
HightouchManaged SaaSMinutesReverse ETL to SaaS tools
Estuary FlowManaged/Self-hostedSecondsReal-time transformations
DatastreamGCP managedMinutesGCP-native pipelines
AWS DMSAWS managedMinutesAWS migrations and replication

Choosing a CDC tool

Choose Debezium if you’re comfortable with Kafka and want maximum flexibility. It’s the most powerful option but requires the most operational investment.

Choose Airbyte if you need to replicate to multiple destinations and want an open-source option with a friendly UI. The managed cloud version reduces operations burden.

Choose Fivetran if you need enterprise-grade reliability for warehouse loading and can budget for a managed service. It’s the most polished option for data teams.

Choose Hightouch if your primary need is reverse ETL—getting warehouse data into operational tools. Pair it with another CDC tool for the forward direction.

Choose Estuary Flow if you need sub-second latency and want to transform data in-flight without separate infrastructure.

Choose Datastream or DMS if you’re committed to GCP or AWS respectively and want native integration with your cloud provider’s ecosystem.

Wrapping up

PostgreSQL’s logical replication makes CDC possible, and the tools above make it practical. The right choice depends on your latency requirements, destination systems, operational capacity, and cloud provider. For most teams, starting with a managed service (Fivetran, Airbyte Cloud, or cloud-native options) reduces the initial complexity, with the option to move to self-hosted Debezium later if needed.