Menu
Blog Documentation Community Pricing Demo Call Sign Up
Sign Up

Postgres Logical Replication in Practice

How Postgres logical replication works: publications, subscriptions, replication slots, schema migrations, and the pitfalls that bite in production.

Postgres

This post was written by an engineer at QueryPlane. QueryPlane is an app builder for your database: bring your own postgres db and you can create interactive applications to share with other developers, coworkers or even your customers. If you’re interested in trying it out, get started here.


Logical replication is how you stream row-level changes from one Postgres database to another without copying the entire on-disk format. It is the right primitive for zero-downtime major-version upgrades, RDS-to-self-hosted migrations, multi-region read replicas with selective tables, and the ingestion side of every modern CDC pipeline. It is also one of the easier features to misconfigure in a way that fills your primary’s disk overnight, so the operational details matter.

This post walks through how logical replication actually works in Postgres, how to set up a publication and subscription correctly, the differences between Postgres 14 and 17 in this area, and the pitfalls that show up only under real traffic.

In this post, we’ll cover:

  • What logical replication is - the mental model and how it differs from physical streaming
  • Publications and subscriptions - the API for declaring what to ship and where
  • Replication slots and the WAL - the underlying mechanism and why it can break things
  • Schema and DDL - what the protocol replicates and what it silently does not
  • Migrations and major-version upgrades - the canonical use case
  • Postgres 16 and 17 improvements - parallel apply, slot failover, replication from standbys
  • Pitfalls and gotchas - the failure modes you only hit in production

What Postgres logical replication actually is

Physical replication ships the raw bytes of the write-ahead log from a primary to a standby. The standby ends up byte-identical to the primary — same Postgres version, same file format, same everything. It is fast, reliable, and inflexible: you replicate the whole cluster or nothing.

Logical replication ships decoded changes instead. The primary runs a logical decoding plugin (the built-in pgoutput for native logical replication, wal2json for JSON output, decoderbufs for protobuf) that walks the WAL and produces a stream of INSERT, UPDATE, DELETE, and TRUNCATE events tagged with table name, primary key, and column values. A subscriber consumes that stream and applies the changes as ordinary SQL on its end.

That difference unlocks everything logical replication is good for:

  • The publisher and subscriber can run different Postgres major versions, so it works for upgrades.
  • You can publish a subset of tables instead of the whole cluster, so it works for partial multi-region replicas and for handing one team’s tables off to a separate cluster.
  • The subscriber is a fully writable Postgres database, so it can have its own indexes, its own additional tables, and its own load.
  • The decoded stream is the same primitive Debezium, Materialize, ClickHouse PeerDB, and most other CDC systems consume.

The trade-off is that the apply path on the subscriber is real SQL, so it is bound by the subscriber’s write throughput and by index maintenance overhead. A logically replicated database does not necessarily keep up with a high-write primary.

Publications and subscriptions

The two declarative pieces are the publication on the source and the subscription on the destination. A publication is a named set of tables on the publisher whose changes get exposed:

-- on the publisher
CREATE PUBLICATION app_pub FOR TABLE users, orders, order_items;

-- or every table in a schema (Postgres 15+)
CREATE PUBLICATION app_pub FOR TABLES IN SCHEMA app;

-- or every table in the database
CREATE PUBLICATION app_pub FOR ALL TABLES;

The publication does not, by itself, cause any work to happen — it is just a declaration. Decoding does not start until a subscription connects.

On the subscriber, you point at the publisher with a connection string and pick a publication to consume:

-- on the subscriber
CREATE SUBSCRIPTION app_sub
  CONNECTION 'host=primary.internal dbname=app user=replicator password=...'
  PUBLICATION app_pub;

CREATE SUBSCRIPTION does several things in one statement: it creates a replication slot on the publisher (named after the subscription unless you specify slot_name), runs an initial COPY of every table in the publication, then starts streaming WAL changes that arrive after the copy finishes. By default it does all of that synchronously, which is fine for small tables and a problem for large ones — large tables almost always want (create_slot = false) plus a manual pg_create_logical_replication_slot call so you can stage the initial copy out of band.

A few connection-string requirements that bite people on first setup. The user in the connection string needs the REPLICATION attribute (ALTER ROLE replicator REPLICATION), the publisher’s pg_hba.conf needs an entry that matches the actual database name in the subscription (e.g. host app replicator <subscriber-ip>/32 scram-sha-256) — the special replication keyword in pg_hba.conf matches only physical streaming replication, not logical subscriptions — and wal_level on the publisher must be logical, not the default replica. Changing wal_level requires a restart.

Replication slots and the WAL

The piece that makes logical replication operational, not just functional, is the replication slot. A slot is a server-side bookmark on the publisher that tracks the oldest WAL position the subscriber has not yet confirmed. The publisher will not reuse or delete WAL files past that position, no matter what wal_keep_size or archiving says.

This is the feature that makes logical replication reliable across subscriber outages — if your subscriber goes offline for an hour, the publisher dutifully keeps every byte of WAL produced during that hour and replays it when the subscriber reconnects. It is also the failure mode that takes down primaries.

A subscriber that goes offline and never comes back leaves a slot pinning WAL forever. The publisher’s pg_wal directory grows, the disk fills, and the database stops accepting writes. This is the single most common operational failure with logical replication, and it is silent until the disk is full. The defenses are:

  • Always set max_slot_wal_keep_size on the publisher (Postgres 13+). When a slot’s retention exceeds the limit, the slot is invalidated rather than allowed to fill the disk. Better to lose a subscriber than lose the primary.
  • Monitor pg_replication_slots.confirmed_flush_lsn and the lag in bytes between that and pg_current_wal_lsn(). Alert at a threshold that gives you operational time before the disk fills.
  • Drop slots aggressively when a subscriber is decommissioned. DROP SUBSCRIPTION on the subscriber tries to drop the slot on the publisher; if the connection is dead, the local subscription drops but the slot stays. The fix is ALTER SUBSCRIPTION ... DISABLE and ALTER SUBSCRIPTION ... SET (slot_name = NONE) before dropping, then drop the slot manually on the publisher with pg_drop_replication_slot.

The slot also tracks the oldest transaction visible to the subscriber, which means the publisher cannot vacuum dead tuples newer than that transaction. A slow subscriber pins not just WAL but also bloat — the same defense (max_slot_wal_keep_size plus monitoring) handles both.

See what QueryPlane can build for you

Connect to your database, write SQL with AI, and build shareable apps — all from your browser.

What logical replication does and doesn’t replicate

Logical replication replicates DML — INSERT, UPDATE, DELETE, and TRUNCATE. It does not replicate DDL. This is the detail that surprises people most often, because everything else about logical replication is so smooth that it feels like physical replication minus the version lock.

In practice, that means schema changes are a coordinated dance:

  1. Apply the additive change (ADD COLUMN, CREATE INDEX) on the subscriber first.
  2. Apply the same change on the publisher.
  3. Restart the subscription if the change touched a published table’s column list.
  4. For destructive changes (DROP COLUMN, ALTER TYPE that rewrites the table), reverse the order — publisher first, then subscriber.

The reason is that the publisher’s pgoutput plugin emits change events using the publisher’s current schema. If the subscriber’s schema is missing a column, the apply worker errors and the subscription stalls. If the subscriber has an extra column, the apply uses the column’s default and continues — which is why the additive direction is “subscriber first.”

A few more things logical replication does not replicate:

  • Sequences. The current value of users_id_seq is not synchronized. After failover, you must setval on the new primary to skip past any ids the old primary had reserved. Postgres 16 partially fixed this by replicating sequence values during the initial sync, but ongoing changes are still not replicated.
  • Large objects (the lo_* API). Use bytea instead.
  • Roles, grants, and tablespaces. These are global to the cluster and have to be recreated on the subscriber out of band.
  • Tables without a primary key or REPLICA IDENTITY FULL. Updates and deletes need a row identity to apply; without one, the publisher errors on the first UPDATE to a published table. REPLICA IDENTITY FULL works but is expensive — the publisher has to log the entire old row, and the subscriber has to do a full row match, which without an index is quadratic.

The canonical use case: zero-downtime upgrades

The most common reason to set up logical replication is a major-version upgrade or a move between providers. The pattern is:

  1. Stand up the target Postgres on the new version (or on the new provider).
  2. pg_dump --schema-only from the old database, apply on the new.
  3. Create a publication on the old database for every table.
  4. Create a subscription on the new database. The subscription’s initial COPY does the bulk load; ongoing replication keeps the new database current.
  5. Wait for replication lag to approach zero and stay there under load.
  6. At the cutover window, stop writes to the old database, wait for the subscriber to drain, repoint the application, advance sequences with setval, and disable the subscription.

The total downtime is the cutover window — typically seconds to a minute, dominated by the application restart and the sequence advance. Compared to pg_dump / pg_restore (minutes to hours of downtime depending on database size), this is dramatic.

Two things that go wrong on real upgrades. First, large tables can take many hours to do their initial COPY, during which the publisher’s slot accumulates WAL — max_slot_wal_keep_size must be generous enough to cover the copy duration plus a buffer, or the upgrade fails partway through. Second, tables without a primary key (yes, you have at least one) need either an added primary key, a unique-not-null index marked REPLICA IDENTITY USING INDEX, or REPLICA IDENTITY FULL before they can be replicated under UPDATE or DELETE. Audit the publisher’s tables with this query before starting:

SELECT n.nspname, c.relname, c.relreplident
FROM pg_class c JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind = 'r'
  AND c.relreplident = 'd'        -- 'd' means default, which is primary key
  AND NOT EXISTS (
    SELECT 1 FROM pg_index i
    WHERE i.indrelid = c.oid AND i.indisprimary
  );

Any row this returns is a published table that will fail on the first UPDATE.

Postgres 16 and 17 improvements

Logical replication has been the area of most active development in recent Postgres releases, and the operational story in 17 is meaningfully better than in 14.

Postgres 16 added two changes that matter for production. The first is logical replication from standbys: a subscription can connect to a physical standby instead of the primary, which lets you offload the decoding cost of a busy primary. The second is parallel apply: large transactions on the publisher can be applied in parallel on the subscriber via streaming = parallel, which dramatically improves apply throughput on write-heavy workloads.

Postgres 17 added slot failover to physical standbys. Previously, if your publisher failed over to its physical standby, every logical subscription was broken — the new primary did not have the slot. With Postgres 17, slots can be marked failover = true and are synchronized to the standby, so a physical failover does not blow up logical replication. This is the single biggest reliability improvement to logical replication in years and is the reason most production deployments are moving to 17 quickly. Postgres 17 also added a pg_createsubscriber binary that converts a physical standby into a logical replica in place, removing the initial-copy bottleneck for upgrades of very large databases.

If you are starting a new logical replication setup today, you want to be on 16 or 17. The slot-failover support alone removes most of the operational pain.

Pitfalls and gotchas

A few failure modes that are not obvious from the docs.

The slot fills the publisher’s disk

Already covered above, but worth restating because it is the most common production failure. Always set max_slot_wal_keep_size and monitor slot lag. The default is unlimited, which is the wrong choice for almost every deployment.

Apply lag from a missing index

Every UPDATE and DELETE event arrives at the subscriber as a WHERE primary_key = X (or with REPLICA IDENTITY FULL, as a row match). If the corresponding index on the subscriber’s copy of the table does not exist, every applied row does a sequential scan. This shows up as ever-growing replication lag and a CPU-bound apply worker on the subscriber. The fix is to ensure the subscriber has the same indexes as the publisher — pg_dump --schema-only does this for you on initial setup but not for indexes added later.

Toasted columns and REPLICA IDENTITY DEFAULT

When an UPDATE does not touch a TOASTed column (a value too large for an inline page), the publisher does not log the column’s new value — only changed columns are logged. This is correct under REPLICA IDENTITY DEFAULT because the subscriber holds the previous value. But if the subscriber’s row went out of sync for any reason (say, a migration ran on the subscriber that you forgot about), the unchanged column on the subscriber stays wrong forever, because no future UPDATE will retransmit it. This is a strong argument for treating the subscriber as read-only at the application level.

TRUNCATE does not respect WHERE filters

Row-level filters on publications (Postgres 15+) only apply to INSERT, UPDATE, and DELETE. A TRUNCATE on the publisher’s table is replicated as a full table truncate on the subscriber, regardless of any WHERE filter you set. If your subscription is meant to hold a subset of a table, do not allow TRUNCATE on the publisher’s published tables — or exclude TRUNCATE from the publication with WITH (publish = 'insert,update,delete').

Initial copy holds a snapshot for as long as it runs

The initial COPY takes a REPEATABLE READ snapshot on the publisher. Vacuum cannot remove dead tuples newer than that snapshot for the duration of the copy. On a busy table that takes hours to copy, this can produce significant bloat on the publisher. For very large tables, consider pg_create_logical_replication_slot plus pg_dump --snapshot=... plus CREATE SUBSCRIPTION ... WITH (copy_data = false, create_slot = false) to do the bulk load out of band.

Cascading replication is not supported by pgoutput

The native pgoutput plugin only replicates changes that originated on the publisher itself. Changes that the publisher received from another upstream subscription are not re-emitted. If you need cascading replication (A → B → C), either use a non-native decoder (pglogical supports this) or have C subscribe directly to A.

Try it on your own database

You can run the slot inspection and replica-identity audit queries above against any Postgres database in QueryPlane’s SQL playground — no account or local install needed. The PostgreSQL syntax checker catches the most common mistakes in CREATE PUBLICATION and CREATE SUBSCRIPTION statements before you run them against a primary.

For day-to-day operations against a database that has logical replication configured — checking slot lag, monitoring apply workers, running ad-hoc queries against the subscriber for read traffic — connect QueryPlane to your Postgres. The schema explorer gives you a fast view of which tables have a primary key, which makes the pre-replication audit obvious instead of a SQL exercise.

Frequently asked questions

What is the difference between logical and physical replication in Postgres? Physical replication ships raw WAL bytes from a primary to a byte-identical standby on the same Postgres major version. Logical replication ships decoded row-level changes (INSERT, UPDATE, DELETE, TRUNCATE) and applies them as ordinary SQL on the subscriber. Physical is faster and simpler; logical is flexible — different versions, partial schemas, writable subscribers, and the foundation for CDC pipelines.

Does Postgres logical replication support DDL? No. Logical replication replicates DML only. Schema changes (ALTER TABLE, CREATE INDEX, etc.) must be applied separately on both sides, with the order depending on whether the change is additive or destructive. This is the most common surprise when teams first deploy it. Tools like Bucardo and Postgres extensions are working on DDL replication, but it is not in the core protocol as of Postgres 17.

How do I set up Postgres logical replication? On the publisher, set wal_level = logical (requires restart), grant a replication user, allow it in pg_hba.conf, and run CREATE PUBLICATION pub FOR ALL TABLES (or for specific tables). On the subscriber, run CREATE SUBSCRIPTION sub CONNECTION '...' PUBLICATION pub. The subscription handles the initial COPY and ongoing streaming automatically. Tables without a primary key need REPLICA IDENTITY FULL or a unique-not-null index marked as the replica identity.

Why is my Postgres replication slot growing the WAL forever? A logical replication slot pins every WAL byte from its confirmed_flush_lsn to the current WAL position until the subscriber acknowledges. If the subscriber is offline, slow, or has been deleted without dropping the slot, WAL accumulates and can fill the disk. Set max_slot_wal_keep_size on the publisher (Postgres 13+) so misbehaving slots are invalidated rather than allowed to take down the primary.

Can I use logical replication to upgrade Postgres major versions? Yes — this is the canonical use case. Stand up the new version, copy the schema over, create a publication on the old database and a subscription on the new one, wait for replication lag to drain, then cut traffic over during a brief window. Total downtime is typically seconds to a minute. Postgres 17’s pg_createsubscriber makes this even faster for very large databases by converting an existing physical standby into a logical replica in place.

What is the difference between pgoutput, wal2json, and Debezium? pgoutput is the native Postgres logical decoding plugin, used by CREATE SUBSCRIPTION. wal2json is a third-party plugin that emits the same changes as JSON, useful when the consumer is not another Postgres. Debezium is a CDC framework that uses either pgoutput or its own decoderbufs plugin under the hood and lands the changes in Kafka, Kinesis, or another streaming system. All three read from the same underlying logical decoding API; they differ in output format and in what the consumer is expected to do with the stream.

Does logical replication work on AWS RDS, Cloud SQL, and Supabase? Yes, with provider-specific setup. RDS exposes wal_level via the parameter group (rds.logical_replication = 1), and the rds_replication role grants the necessary permissions. Cloud SQL has a similar flag (cloudsql.logical_decoding). Supabase exposes logical replication via its dashboard and is the substrate for Supabase Realtime. None of these provide raw superuser access, so you cannot install non-native plugins like wal2json — you are restricted to pgoutput and whatever plugins the provider ships.

Wrapping up

Postgres logical replication is the right tool when you need cross-version replication, partial-schema replication, a writable subscriber, or a CDC stream into another system. The mental model is small — publications declare what to ship, subscriptions consume it, slots make it durable across outages — but the operational details (slot retention, replica identity, schema coordination) are where production deployments live or die.

The biggest single improvement you can make if you are running an older setup is to upgrade to Postgres 17 and turn on slot failover. The biggest single mistake you can make is leaving max_slot_wal_keep_size unset and trusting that no subscriber will ever go offline for long enough to fill the publisher’s disk. Set it, monitor slot lag, audit replica identity before you publish, and treat the subscriber as read-only — and logical replication is one of the most reliable tools Postgres ships with.