Backup and recovery v6.1.0
PGD is designed to be a distributed, highly available system. If one or more nodes of a cluster are lost, the best way to replace them is to clone new nodes directly from the remaining nodes.
The role of backup and recovery in PGD is to provide for disaster recovery (DR), such as in the following situations:
- Loss of all nodes in the cluster
- Significant, uncorrectable data corruption across multiple nodes as a result of data corruption, application error, or security breach
Logical backup and restore
You can use pg_dump, sometimes referred to as logical backup, normally with PGD.
Temporary postgresql.conf settings
First, temporarily set the following settings in postgresql.conf:
# Increase from the default of `1GB` to something large, but still a # fraction of your disk space since the non-WAL data must also fit. # This decreases the frequency of checkpoints. max_wal_size = 100GB # Increase the amount of memory for building indexes. Default is # 64MB. For example, 1GB assuming 128GB total RAM. maintenance_work_mem = 1GB # Increase the receiver and sender timeout from 1 minute to 1hr to # allow large transactions through. wal_receiver_timeout = 1h wal_sender_timeout = 1h # Increase the number of writers to make better use of parallel # apply. Default is 2. Make sure this isn't overriden lower by the # node group config num_writers setting. bdr.writers_per_subscription = 5 # Increase Raft-related election timeouts with default values of 6s # and 3s. bdr.raft_global_election_timeout = 20s bdr.raft_group_election_timeout = 10s # Increase the size of the shared memory queue used by the receiver to # send data to the writer process from the default 1MB. bdr.writer_input_queue_size = 32MB
Additionally:
- Make sure the default bdr.streaming_mode = 'auto' is not overridden so that transactions are streamed.
- Make sure any session or postgresql.conf settings listed above are not overriden by node group-level settings in general.
Now continue with pg_dump and pg_restore.
pg_dump / pg_restore
In order to reduce the risk of global lock timeouts, we recommend dumping pre-data, data, and post-data separately. For example:
pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=pre-data -Fc -f pgd-pre-data.dump
pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=data -Fc -f pgd-data.dump
pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=post-data -Fc -f pgd-post-data.dumpAnd restore by directly executing these SQL files on a node (do not run these on the connection manager port):
PGOPTIONS="-cbdr.commit_scope=local" pg_restore -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB --section=pre-data -f pgd-pre-data.dump
PGOPTIONS="-cbdr.commit_scope=local" pg_restore -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB --section=data -f pgd-data.dump
psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -c 'SELECT bdr.wait_slot_confirm_lsn(NULL, NULL)'
PGOPTIONS="-cbdr.commit_scope=local" pg_restore -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB --section=post-data -f pgd-post-data.dump
psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -c 'SELECT bdr.wait_slot_confirm_lsn(NULL, NULL)'After which point the dump will be restored on all nodes in the cluster.
In contrast if you do not split sections out with a naive pg_dump and pg_restore, the restore will likely fail with a global lock timeout.
If you still get global lock timeouts with pg_restore,
add -cbdr.ddl_locking=off to PGOPTIONS.
If you choose to run pg_restore with -j/--jobs you will need to
increase max_worker_processes and max_parallel_maintenance_workers
by the same amount.
Prefer restoring to a single node cluster
Especially when initially setting up a cluster from a Postgres dump,
we recommend you restore to a cluster with a single PGD node. Then run
pgd node setup for each node you want in the cluster which will do a
physical join that uses bdr_init_physical under the hood.
Sequences
pg_dump dumps both local and global sequences as if
they were local sequences. This behavior is intentional, to allow a PGD
schema to be dumped and ported to other PostgreSQL databases.
This means that sequence-kind metadata is lost at the time of dump,
so a restore effectively resets all sequence kinds to
the value of bdr.default_sequence_kind at time of restore.
To create a post-restore script to reset the precise sequence kind for each sequence, you might want to use a SQL script like this:
SELECT 'SELECT bdr.alter_sequence_set_kind('''|| nspname||'.'||relname||''','''||seqkind||''');' FROM bdr.sequences WHERE seqkind != 'local';
If you run pg_dump using bdr.crdt_raw_value = on, then you can reload the
dump only with bdr.crdt_raw_value = on.
Technical Support recommends the use of physical backup techniques for backup and recovery of PGD.
Physical backup and restore
You can take physical backups of a node in an EDB Postgres Distributed cluster using standard PostgreSQL software, such as Barman.
You can perform a physical backup of a PGD node using the same procedure that applies to any PostgreSQL node. A PGD node is just a PostgreSQL node running the BDR extension.
Consider these specific points when applying PostgreSQL backup techniques to PGD:
- PGD operates at the level of a single database, while a physical backup includes all the databases in the instance. Plan your databases to allow them to be easily backed up and restored. 
- Backups make a copy of just one node. In the simplest case, every node has a copy of all data, so you need to back up only one node to capture all data. However, the goal of PGD isn't met if the site containing that single copy goes down, so the minimum is at least one node backup per site (with many copies, and so on). 
- However, each node might have unreplicated local data, or the definition of replication sets might be complex so that all nodes don't subscribe to all replication sets. In these cases, backup planning must also include plans for how to back up any unreplicated local data and a backup of at least one node that subscribes to each replication set. 
Restore
While you can take a physical backup with the same procedure as a standard PostgreSQL node, it's slightly more complex to restore the physical backup of a PGD node.
EDB Postgres Distributed cluster failure or seeding a new cluster from a backup
The most common use case for restoring a physical backup involves the failure or replacement of all the PGD nodes in a cluster, for instance in the event of a data center failure.
You might also want to perform this procedure to clone the current contents of a EDB Postgres Distributed cluster to seed a QA or development instance.
In that case, you can restore PGD capabilities based on a physical backup of a single PGD node, optionally plus WAL archives:
- If you still have some PGD nodes live and running, fence off the host you restored the PGD node to, so it can't connect to any surviving PGD nodes. This practice ensures that the new node doesn't confuse the existing cluster.
- Restore a single PostgreSQL node from a physical backup of one of the PGD nodes.
- If you have WAL archives associated with the backup, create a suitable
postgresql.conf, and start PostgreSQL in recovery to replay up to the latest state. You can specify an alternativerecovery_targethere if needed.
- Start the restored node, or promote it to read/write if it was in standby recovery. Keep it fenced from any surviving nodes!
- Clean up any leftover PGD metadata that was included in the physical backup.
- Fully stop and restart the PostgreSQL instance.
- Add further PGD nodes with the standard procedure based on the
bdr.join_node_group()function call.
Cleanup of PGD metadata
To clean up leftover PGD metadata:
- Drop the PGD node using bdr.drop_node.
- Fully stop and restart PostgreSQL (important!).
Cleanup of replication origins
You must explicitly remove replication origins with a separate step because they're recorded persistently in a system catalog. They're therefore included in the backup and in the restored instance. They aren't removed automatically when dropping the BDR extension because they aren't explicitly recorded as its dependencies.
To track progress of incoming replication in a crash-safe way, PGD creates one replication origin for each remote master node. Therefore, for each node in the previous cluster run this once:
SELECT pg_replication_origin_drop('bdr_dbname_grpname_nodename');You can list replication origins as follows:
SELECT * FROM pg_replication_origin;
Those created by PGD are easily recognized by their name.
Cleanup of replication slots
If a physical backup was created with pg_basebackup, replication slots
are omitted from the backup.
Some other backup methods might preserve replications slots, likely in outdated or invalid states. Once you restore the backup, use these commands to drop all replication slots:
SELECT pg_drop_replication_slot(slot_name) FROM pg_replication_slots;
If you have a reason to preserve some slots,
you can add a WHERE slot_name LIKE 'bdr%' clause, but this is rarely
useful.
Warning
Never use these commands to drop replication slots on a live PGD node
Eventual consistency
The nodes in an EDB Postgres Distributed cluster are eventually consistent but not entirely consistent. A physical backup of a given node provides point-in-time recovery capabilities limited to the states actually assumed by that node.
The following example shows how two nodes in the same EDB Postgres Distributed cluster might not (and usually don't) go through the same sequence of states.
Consider a cluster with two nodes, N1 and N2, that's initially in
state S. If transaction W1 is applied to node N1, and at the same
time a non-conflicting transaction W2 is applied to node N2, then
node N1 goes through the following states:
(N1) S --> S + W1 --> S + W1 + W2
Node N2 goes through the following states:
(N2) S --> S + W2 --> S + W1 + W2
That is, node N1 never assumes state S + W2, and node N2
likewise never assumes state S + W1. However, both nodes end up
in the same state S + W1 + W2. Considering this situation might affect how
you decide on your backup strategy.
Point-in-time recovery (PITR)
The previous example showed that the changes are also inconsistent in time.
W1 and W2 both occur at time T1, but the change W1 isn't
applied to N2 until T2.
PostgreSQL PITR is designed around the assumption of changes arriving
from a single master in COMMIT order. Thus, PITR is possible by
scanning through changes until one particular point in time (PIT) is reached.
With this scheme, you can restore one node to a single PIT
from its viewpoint, for example, T1. However, that state doesn't include other
data from other nodes that committed near that time but had not yet
arrived on the node. As a result, the recovery might be considered to
be partially inconsistent, or at least consistent for only one
replication origin.
With PostgreSQL PITR, you can use the standard syntax:
recovery_target_time = T1
PGD allows for changes from multiple masters, all recorded in the WAL log for one node, separately identified using replication origin identifiers.
PGD allows PITR of all or some replication origins to a specific point in time, providing a fully consistent viewpoint across all subsets of nodes.
Thus for multi-origins, you can view the WAL stream as containing multiple streams all mixed up into one larger stream. There's still just one PIT, but that's reached as different points for each origin separately.
The WAL stream is read until requested origins have found their PIT. All changes are applied up until that point, except that any transaction records aren't marked as committed for an origin after the PIT on that origin is reached.
You end up with one LSN "stopping point" in WAL, but you also have one single timestamp applied consistently, just as you do with single-origin PITR.
Once you reach the defined PIT, a later one might also be set to allow the recovery to continue, as needed.
After the desired stopping point is reached, if the recovered server
will be promoted, shut it down first. Move the LSN forward to an LSN value higher
than used on any timeline on this server using pg_resetwal.
This approach ensures that there are no duplicate LSNs produced by logical
decoding.
In the specific example shown, N1 is restored to T1. It
also includes changes from other nodes that were committed
by T1, even though they weren't applied on N1 until later.
To request multi-origin PITR, use the standard syntax in
the postgresql.conf file:
recovery_target_time = T1
You need to specify the list of replication origins that are restored to T1 in one of two ways.
You can use a separate multi_recovery.conf file by way of
a new parameter, recovery_target_origins:
recovery_target_origins = '*'
Or you can specify the origin subset as a list in recovery_target_origins:
recovery_target_origins = '1,3'
The local WAL activity recovery to the specified
recovery_target_time is always performed implicitly. For origins
that aren't specified in recovery_target_origins, recovery can
stop at any point, depending on when the target for the list
mentioned in recovery_target_origins is achieved.
In the absence of the multi_recovery.conf file, the recovery defaults
to the original PostgreSQL PITR behavior that's designed around the assumption
of changes arriving from a single master in COMMIT order.
Note
This feature is available only with EDB Postgres Extended.
Barman doesn't create a multi_recovery.conf file.
Monitoring
Use the following queries to check on the progress of the restore process.
SELECT pg_size_pretty(pg_database_size('bdrdb'));
The above query shows the database size on the restoring node. The size should grow as the restore makes progress and approaches the size of the original node. However, due to bloat, logical restores are always a little smaller than the original.
SELECT * FROM bdr.node_replication_rates;
The above query shows the rate of replication. However, the progress info can be misleading for big transactions; lag and progress will appear to stair-step.
SELECT application_name, state, wait_event_type, wait_event, now() - state_change AS state_change_ago FROM pg_stat_activity WHERE application_name LIKE '%pg_restore%';
The above query shows information on what pg_restore is doing, if it's blocked/waiting (on what is waiting) or working, and changing its status continuously.
Check the following views to see issues with replication slots, accumulated lag, broken replication, etc.
- pg_catalog.pg_stat_replication_slots
- pg_catalog.pg_replication_slots
- bdr.node_slots
And use bdr.stat_subscription to see statistics for each
subscription, for example to check on parallel apply or transaction
stream.