Migrating to Patroni
This page describes how to migrate an existing Postgres cluster to Patroni managed cluster.
Overview of steps
- Disable current HA solution, if any.
- Perform the following steps on all nodes, starting with the primary and then going with standbys:
- Disable Postgres systemd unit.
- Create a YAML configuration file for Patroni. If you use replication slots in the existing cluster, then make sure to configure permanent slots so Patroni does not delete them while migrating.
- Enable Patroni systemd unit.
- Start Patroni systemd unit.
- Let Patroni take over the "startup procedure" of Postgres:
- Restart all standby nodes.
- Schedule a restart of the primary node during a maintenance window.
- Clean up slots from original cluster, if any:
- Once Postgres is able to consume all WALs between the original slots and the new slots that have been created by Patroni, then remove the permanent slots configuration. This will allow Patroni to drop the original slots.
Note: The above procedure was designed for minimal downtime. We strongly recommend that you conduct these steps during a scheduled maintenance window, and test at least a switchover/switchback operation after all the above steps have been performed.
More detailed example
In this section we will cover an example with more detailed steps for migrating an existing Postgres cluster to a Patroni managed cluster.
For the sake of simplicity, we will put some notes, when relevant, considering three possibilities for an existing cluster, just so we don't repeat ourselves regarding similar steps:
For the example, the following assumptions are made:
- You already have a working
etcd
cluster — refer to Installing and configuring etcd to install and set upetcd
to be used with Patroni. Theetcd
cluster here is composed of the following nodes:etcd-node-1
.etcd-node-2
.etcd-node-3
.
- The Postgres cluster that needs to be migrated has the following nodes:
postgres-node-1
— the primary node.postgres-node-2
— a standby ofpostgres-node-1
.postgres-node-3
— a standby ofpostgres-node-1
.
- The existing Postgres cluster uses replication slots to control replication from primary to standbys, and they are named:
slot_for_postgres_node_2
— slot used by replica nodepostgres-node-2
.slot_for_postgres_node_3
— slot used by replica nodepostgres-node-3
.
- You already have the following Postgres users in the cluster:
postgres
with passwordpostgres_password
— Postgres superuser.replicator
with passwordreplicator_password
— used by replication connections.rewind
with passwordrewind_password
— user with rights for executingpg_rewind
.
- You are running PostgreSQL with the following characteristics:
- Postgres is running on port
5432
. - Postgres data directory is
/var/lib/postgres/data
. - Postgres binaries directory is
/usr/postgres/bin
.
- Postgres is running on port
- You have already installed Patroni on all the Postgres nodes.
- The systemd unit names are:
postgres
— for Postgres.patroni
— for Patroni.edb-efm
— in case EFM is being used in the existing cluster.repmgrd
— in case repmgr is being used in the existing cluster.
- The configuration files are:
/etc/patroni.yml
— for Patroni./etc/repmgr.conf
— in case repmgr is being used in the existing cluster.
- The cluster name is
test_cluster
. This is relevant only for EFM and Patroni.
Notes:
- Please adjust the example assumptions according to your environment.
- Setting up watchdog and HAProxy are out of the scope of this example. However, as they are part of the recommended architecture, you should read the following documentation:
- This example uses PostgreSQL. If you are running EDB Postgres Extended or EDB Postgres Advanced Server, then we recommend you also read the following documentation:
In the following subsections you will find the actual walkthrough.
1. Disable current HA solution
If you have an HA solution managing your current cluster, then you need to disable it. If you do not have any HA solution, this section can be skipped.
If you are using EFM:
- Stop all EFM agents. On any Postgres node:
- Disable the EFM systemd unit. This is required so nothing other than Patroni will try to start Postgres up. On all Postgres nodes:
If you are using repmgr:
- Stop the repmgr daemon. On any postgres node:
- Disable the repmgr systemd unit. This is required so nothing other than Patroni will try to start Postgres up. On all postgres nodes:
2. Set up Patroni
You need to perform the following steps on all Postgres nodes. Perform all steps on one node before proceeding with the next node. Start with the primary node (postgres-node-1
), then proceed with each standby node (postgres-node-2
and postgres-node-3
).
- Disable Postgres systemd unit. This is required so nothing other than Patroni will try to start Postgres. Run:
- Create a YAML configuration file for Patroni (
/etc/patroni.yml
). You can find below a very basic template for the configuration file:
Please keep in mind the following suggestions related to the configuration file from above:
- This is just an example template, and you will need to change it according to your setup.
- It was created as an example configuration file for node
postgres-node-1
. Adjust accordingly for other nodes (postgres-node-2
andpostgres-node-3
). - Some key points about the proposed settings:
scope
: is the name of the Patroni cluster.name
: is the name of the managed Postgres node.etcd3.hosts
: contains the list of addresses to reach nodes that are part of theetcd
clusterbootstrap.dcs.postgresql.slots
: contains a list of slots that should be permanent, thus not dropped by Patroni when it starts running. We fill this so we can guarantee WALs will be available for replicas until they reach the LSN of the slots that were created by Patroni.bootstrap.pg_hba
: contains a few basic HBA rules that Patroni will write to thepg_hba.conf
file, so users are able to connect. Adapt it to your needs, as the rules in this example are somehow very permissive.
- We strongly recommend reading the Patroni documentation about Patroni configuration. You will likely need to at least specify some settings through
bootstrap.dcs.postgresql.parameters
(Postgres settings that apply to all nodes - used once to populate the Patroni DCS when it is initialised) andpostgresql.parameters
(Postgres settings that apply to local node) sections of the configuration file to make the Postgres configuration compatible with what you have in your existing cluster.
Create or adjust the Patroni systemd unit. You can find instructions about that in the quick start on RHEL 8 or Debian 11.
Enable the Patroni systemd unit. This is done so Patroni will automatically start up on boot and start Postgres. Run:
- Start the Patroni systemd unit. This is done so you start Patroni right now. Run:
3. Hand over Postgres startup to Patroni
At this point Patroni is already monitoring the Postgres instances on each node. However, Postgres is still running using its own systemd unit (postgres
). Ideally we should make Patroni start Postgres through its own means, and stop using the Postgres systemd unit.
We recommend starting with the standbys, and then proceeding with the primary. Also, we assume you can immediately restart the standbys, and that you want to restart the primary during a maintenance window. The steps:
- Immediately restart the standby
postgres-node-2
. On any Postgres node run:
- Immediately restart the standby
postgres-node-3
. On any Postgres node run:
- Schedule a restart of primary
postgres-node-1
to occur during a maintenance window. On any Postgres node:
Note: SOME_SCHEDULE
should be filled with the desired timestamp in an unambiguous format, and preferably with a numeric time zone offset (e.g. 2023-07-25T20:09+00
).
Once all Postgres instances were restarted, they will be running through Patroni instead of through the Postgres systemd unit.
4. Clean up slots from original cluster
At this point we assume all Postgres instances are already running through Patroni, as described in the previous section.
As we intend to clean up the slots from the original cluster, before proceeding with the next steps you should confirm that both Postgres standby nodes have already received (and potentially consumed) all WAL files that exist between the replication slots that existed in the original cluster and the replication slots that were created by Patroni. In other words make sure that:
postgres-node-2
has received all WALs between the LSNs reported by the following replication slots inpostgres-node-1
:slot_for_postgres_node_2
— previously existing slot in the original cluster.postgres_node_2
— slot automatically created by Patroni based on node name.
postgres-node-3
has received all WALs between the LSNs reported by the following replication slots inpostgres-node-1
:slot_for_postgres_node_3
— previously existing slot in the original cluster.postgres_node_3
— slot automatically created by Patroni based on node name.
Note: You can use a query like the one below to check the required LSN of these slots. You need to make sure that the restart_lsn
of the slots created by Patroni is >=
than the restart_lsn
of the original slots. On node postgres-node-1
, through a psql
session:
Once the above conditions are met, we are then able to let Patroni drop the original slots (slot_for_postgres_node_2
, and slot_for_postgres_node_3
). In order to do that, assuming you have not configured any other permanent slots other than slot_for_postgres_node_2
and slot_for_postgres_node_3
, follow these steps:
- Remove permanent slots dynamic configuration. On any Postgres node:
- Wait for about 10 seconds and check that the replication slots
slot_for_postgres_node_2
andslot_for_postgres_node_3
have been dropped frompostgres-node-1
(querypg_replication_slots
to check this).
5. Test switchover and switchback
At this point your cluster is completely managed by Patroni, so it is important to perform a switchover and switchback operations during a maintenance window to ensure Patroni is operating correctly.
As mentioned, you should perform the following steps during a maintenance window. As an example we will perform a switchover and switchback between postgres-node-1
and postgres-node-2
. The steps:
- Check replication lag between nodes. On any Postgres node:
Note: Do not proceed with the next steps if replica nodes are lagging.
- Perform a switchover between
postgres-node-1
andpostgres-node-2
. On any Postgres node:
- After a few seconds, check again the list of members. On any Postgres node:
Note: Do not proceed with the next steps if replica nodes are lagging or not streaming
- Perform a switchback between
postgres-node-1
andpostgres-node-2
. On any Postgres node:
- After a few seconds, check again the list of members. On any Postgres node:
Note: Replica nodes should be streaming and not lagging.
If you were able to get here and see the expected outputs, both switchover and switchback were successful, then congratulations your Patroni cluster is now fully operational.
- On this page
- Overview of steps
- More detailed example
Could this page be better? Report a problem or suggest an addition!