CDC Failover support v6.0.2

Background

Earlier versions of PGD have allowed the creation of logical replication slots on nodes that can provide a feed of the logical changes happening to the data in the database. These logical replication slots have been local to the node and not replicated. Apart from only replicating changes on the particular node, this behavior has presented challenges when faced with node failover in the cluster. In that scenario, a consumer of the logical replication off a node that fails has no replica of the slot on another node to continue consuming from.

While solutions to this can be engineered using a subscriber-only node as an intermediary, it significantly raises the cost of logical replication.

CDC Failover support

To address this need, PGD introduced CDC Failover support. This is an optionally enabled feature that activates automatic logical slot replication across the cluster. This, in turn, allows a consumer of a logical slot’s replication to receive change data from any node when a failure occurs.

How CDC Failover works

When a logical slot is created on a node with CDC Failover support enabled, the slot is replicated across the cluster. This means that the slot is available for consumption on any node in the cluster. When a node fails, the slot can be consumed from another node in the cluster. This allows for continuing the logical replication stream without interruption.

If, though, the consumer of the slot connects to a different node in the cluster, the previous connection the consumer had will be closed by PGD. This behavior is to ensure that the slot isn't being consumed from multiple nodes at the same time. In the background, PGD is using its Raft consensus protocol to ensure that the slot is being consumed from only one node at a time. This means that the guarantee of only one slot being consumed at a time doesn't hold in split-brain scenarios.

Currently CDC Failover support is a global option that's controlled by a top-group option. The failover_slot_scope top-group option can currently be set to (and defaults to) local, which disables replication of logical slots, or global. The global setting enables the replication of all non-temporary logical slots created in the PGD database.

Temporary logical slots aren't replicated, as they have a lifetime scoped to the session that created them and will go away when that session ends.

At-least-once delivery guarantees

CDC Failover support takes steps to ensure that the consumer receives all changes at least once. This is done by holding back slots until delivery has been confirmed, at which point the slot is then advanced on all nodes in an asynchronous manner. In the case of a failure on the node where the slot was being consumed, the slot is held until the consumer connects to a node in the cluster. This then allows the slot to progress.

Important

If a consuming application disconnects and doesn’t reconnect, the slot will remain held back on every node in the cluster. As this consumes disk and memory, it's essential to avoid this situation. Applications that consume slots must return to consuming as soon as possible.

Exactly-once delivery

Currently, there's no way to ensure exactly-once delivery, and we expect consuming applications to manage the discarding of previously completed transactions.

Enabling CDC Failover support

To enable CDC Failover support run the SQL command and call the bdr.alter_node_group_option function with the following parameters:

select bdr.alter_node_group_option(<top-level group name>,
     'failover_slot_scope',
     'global');

Replace <top-level group name> with the name of your cluster’s top-level group. If you don't know the name, it's the group with a node_group_parent_id equal to 0 in bdr.node_group.

If you do not know the name, it is the group with a node_group_parent_id equal to 0 in bdr.node_group. You can also use:

SELECT bdr.alter_node_group_option(
     node_group_name,
     'failover_slot_scope',
     'global')
    from bdr.node_group	
    where node_group_parent_id=0;

This command ensures you're setting the correct top-level group’s option.

Once CDC Failover is enabled, to create a new globally replicated slot, you can use:

SELECT pg_create_logical_replication_slot('myslot',
                                          'test_decoding');

Logical replication slots created before the option was set to global aren't replicated. Only new slots are replicated.

Failover slots can also be created with the CREATE_REPLICATION_SLOT command on a replication connection.

The status of failover slots is tracked in the bdr.failover_replication_slots table.

CDC Failover support with Postgres 17+

For Postgres 17 and later, support for failover was added to allow standbys to be resumed. Use an option in pg_create_logical_replication_slot named failover for this purpose. This new setting requires that, no matter what the setting of failover_slot_scope, you must also set failover to true.

SELECT pg_create_logical_replication_slot('myslot',
                                          'test_decoding',
                                          failover=>true);

Obtaining Initial Consistent Snapshot

When a logical replication slot is created, a consistent snapshot is exported by Postgres. This snapshot can be used to obtain a consistent initial copy of the data. PGD’s failover slot mechanism also follows the same procedure. But the consumer must obtain the snapshot from the same node where the slot was originally created. In addition, it must also start the initial replication from the same node. Once the consumer has received enough changes over the replication stream, the failover slot is marked as failover_safe. Once the slot is marked as failover_safe, then the consumer can safely failover to some other node in the PGD cluster (other considerations apply though, see below).

To check if the slot is failover_safe or not, the user can query the bdr.failover_replication_slots catalog and check for the value of failover_safe column of the given slot.

If the consumer connects to some other PGD node and attempts to start replication before the slot is marked failover_safe, an appropriate error will be raised by PGD.

Failing Over to Newly Joined Nodes

When a new node joins the PGD cluster, it may not be immediately ready to serve as a decoding target for a CDC failover slot. The newly joined node may not have all the WAL files to decode the changes that the consumer has not yet consumed. Consuming from such a node may result in data loss. PGD detects and prevents such situations by internally tracking the replication progress and preventing a new node from being a failover target, until it's safe to do so. If the consumer tries to connect to a node that is not yet ready to serve as a decoding target, an appropriate error will be raised.

Tracking Per-Origin Progress

Transactions can originate from any node in the PGD cluster. When a consumer connects to a PGD node and starts decoding transactions, it may receive changes for the transactions originated on that node as well as transactions replicated from other nodes in the cluster. The consumer is expected to track replication progress across all such PGD nodes or origins and ensure that duplicate transactions are handled correctly. To facilitate this, the test_decoding plugin in Postgres-Extended and EnterpriseDB Advance Server has been enhanced to include the origin information of the transactions. Consumers can opt to receive origin information by setting include-origin option to on while starting the logical replication.

A sample output of test_decoding plugin with the origin information is produced below.

BEGIN 1723654 (origin 2) (origin_name bdr_bdrdemo_bdrgroup_node2) (origin_lsn 0/1D948910)
table public.pgbench_accounts: UPDATE: old-key: aid[integer]:39958 bid[integer]:1 abalance[integer]:0 filler[character]:'                                                                                    ' new-tuple: aid[integer]:39958 bid[integer]:1 abalance[integer]:-1783 filler[character]:'                                                                                    '
table public.pgbench_tellers: UPDATE: old-key: tid[integer]:6 bid[integer]:1 tbalance[integer]:0 new-tuple: tid[integer]:6 bid[integer]:1 tbalance[integer]:-1783 filler[character]:null
table public.pgbench_branches: UPDATE: old-key: bid[integer]:1 bbalance[integer]:0 new-tuple: bid[integer]:1 bbalance[integer]:-1783 filler[character]:null
table public.pgbench_history: INSERT: tid[integer]:6 bid[integer]:1 aid[integer]:39958 delta[integer]:-1783 mtime[timestamp without time zone]:'2025-01-31 16:51:21.511571' filler[character]:null
COMMIT 1723654 (origin 2) (origin_name bdr_bdrdemo_bdrgroup_node2) (origin_lsn 0/1D948910)

Consumers can make use of this information to track per-origin progress.

PGD also records replication progress across all nodes in the bdr.logical_checkpoints catalog and the consumer can receive decoded changes for the catalog and use that information to know the replication progress.

BEGIN 65720
id[name]:'370098259-0-6056978' origin_node[oid]:370098259 origin_lsn[pg_lsn]:'0/6056978' local_node[oid]:370098259 local_lsn[pg_lsn]:'0/6056978' peer_count[integer]:2 peer_nodes[oid[]]:'{2228531844,4052927809}' peer_lsns[pg_lsn[]]:'{0/4836758,0/67F0AF8}'
COMMIT 65720

In this example, the node 370098259 is reporting the replication progress. When the consumer receives this change record, it can be sure of having received everything up to 0/4836758 and 0/67F0AF8 respectively from nodes 2228531844 and 4052927809.

Important

Currently PGD reports node information as OIDs stored in bdr.node catalog. But this will change in the near future and the information will be replaced by UUID.

Limitations

The CDC Failover Slot support comes with certain limitations:

CDC Failover slot support requires the latest versions of EDB Postgres Distributed (PGD) 5.7+ and the latest minor releases of Postgres Extended or EDB Postgres Advanced Server (available Feb 2025).
CDC Failover support is a global option and can't be set on a per-slot basis. Because changing the enabled status of CDC Failover doesn't affect previously provisioned slots, it's possible to enable it (set to global), create a replicated slot, then disable it (set to local) to create a singular replicated slot.
CDC Failover support isn't supported on temporary slots.
CDC Failover support isn't supported on slots created with the failover option set to false.
CDC Failover support works with EDB Postgres Advanced Server and EDB Postgres Extended Server only. It isn't supported on community Postgres installations.
Existing slots aren't converted into failover slots when the option is enabled.
While Postgres’s built-in functions such as pg_logical_slot_get_changes() can be used, they won’t ensure that the slot isn't being decoded anywhere else and can’t update replication progress accurately across the cluster. Therefore, we recommend that you don't rely on the function to receive decoded changes.

← Prev

Decoding worker

↑ Up

PGD Reference

Parallel Apply

CDC Failover support v6.0.2

Background

CDC Failover support

How CDC Failover works

At-least-once delivery guarantees

Important

Exactly-once delivery

Enabling CDC Failover support

CDC Failover support with Postgres 17+

Obtaining Initial Consistent Snapshot

Failing Over to Newly Joined Nodes

Tracking Per-Origin Progress

Important

Limitations

← Prev

↑ Up

Next →