Configure PGD data tiering Innovation Release
Manage the data lifecycle of your Advanced High Availability or Distributed High Availability cluster (PGD) by moving Postgres data to object storage using three different available methods. Configure PGD data tiering either through the Hybrid Manager (HM) console or manually from your PGD cluster command line.
Prerequisites
- Active HM instance version HM 2025.11 and later versions.
- Provisioned PGD cluster version 6.1 or later with PGFS extension and PGAA extension version 1.4.0 or later.
- Catalog service (optional): HM-managed Lakekeeper or external REST-compatible catalog.
- Object storage: S3-compatible with credentials.
- User permissions: the database user must have
CREATE/ALTER/EXECUTEprivileges for the PGD and PGAA functions. - Lakehouse Cluster (recommended) for querying offloaded data.
Note
To check versions, run SELECT bdr.bdr_version(); for PGD and SELECT pgaa.pgaa_version(); for the PGAA extension.
Initial configuration
Configure your PGD cluster to point to object storage for analytics or to an external catalog to write metadata to if you are using one.
When you deploy your PGD cluster, HM automatically provisions a write catalog replication target as the default write location within your configured object storage at provisioning time.
Enable statistics for the node group disks in order to view the impact of tiered storage:
- Find your PGD cluster in the HM console and select Quick Actions.
- Select Edit Cluster.
- Navigate to Data Groups and select your data group.
- Select Edit, then DB configuration.
- Find the configuration parameter
edb.collect_table_statisticsand change its value toall. - Save the changes.
Configure the default write location for your PGD cluster, which can either be a storage location, or a write catalog replication target.
Option 1: Point to storage.
Create a storage location using the PGFS extension. See Creating a storage location for details.
SELECT bdr.replicate_ddl_command($$ SELECT pgfs.create_storage_location( 'storage_location_name', 'protocol://your-bucket-name/path/', '{"region": "region-name"}', '{"access_key_id": "...", "secret_access_key": "..."}' );
Point PGD to the defined storage location:
SELECT bdr.alter_node_group_option('your_pgd_group', 'analytics_storage_location', 'storage_location_name');
Option 2: Point to a catalog. Configure a connection to an external catalog:
SELECT bdr.replicate_ddl_command($$ SELECT pgaa.add_catalog( 'write-catalog-name', 'iceberg-rest', '{ "url": "https://your-catalog-rest-endpoint.com", "token": "YOUR_AUTH_TOKEN", }' ) );
Set the catalog as the analytics write destination:
SELECT bdr.alter_node_group_option('your_pgd_group', 'analytics_write_catalog', 'write-catalog-name');
Convert to Tiered table
Configure an existing heap table to use tiered tables.
Configure a tiered table:
Find your non-partitioned heap table in the Tables tab.
Select Convert to Tiered Table.
Enter the following information:
- Partition Column must be a
DATEorTIMESTAMPforRANGEpartitioning. It must also be included in the primary key. - Initial Lower Bound sets chronological starting point for the first partition.
- Analytics Offload Period: is the threshold age beyond which the data moves into "cold" storage.
- Partition Increment: refers to the time interval in between partitions.
- Minimum Advance Partitions: establishes the minimum count of partitions that the system must proactively create and maintain ahead of the current date.
- Maximum Advance Partitions: sets the number of partitions to create if the number of partitions falls below the value of Minimum Advance Partitions.
- Enable real-time analytics replication triggers replication for data newer than Analytics Offload Period ("hot" data).
- Partition Column must be a
Note
Tables with sequences or foreign key constraints are not supported with tiered tables. Drop the sequence or constraint before converting to tiered table, then recreate it after conversion.
- Once the process completes, the table Type changes from Heap to Tiered.
- Expand the table to view the individual partitions: their Type shows as Heap or HTAP (if you enabled real-time analytics replication) for the "hot" tier, and Iceberg or Delta for the "cold" tier.
- Compare the table values for Disk Rows, Object Storage Rows, Disk Size, and Object Storage Size.
Find your non-partitioned heap table and use the pgaa.convert_to_tiered_table function to convert it to a tiered table:
CALL pgaa.convert_to_tiered_table( 'public.transactional_table', range_partition_column := 'your_timestamp_column', partition_increment := '1 year', analytics_offload_period := '1 year', initial_lower_bound := '2010-01-01', enable_replication := true );
Where:
relationis the name of the table. This function uses Autopartition to configure the table data into partitions.range_partition_columnspecifies the column to partition by. It must beDATEorTIMESTAMP, and it must also be included in the primary key.initial_lower_boundspecifies the chronological starting point for the first partition.partition_incrementdefines the time interval in between partitions.analytics_offload_perioddefines the age threshold that automatically triggers the migration of all older partitions to object storage, marking the data as "cold".enable_replicationWhen set toFALSE, the table's data is only offloaded once partitions exceed the "cold" data threshold, If set toTRUE, it triggers immediate real-time replication.minimum_advance_partitionsestablishes the minimum count of partitions that the system must proactively create and maintain ahead of the current date.
This function converts the heap table into a tiered table and initiates the automated process to offload partitions older than analytics_offload_period to the configured object storage. If you set enable_replication to true, it will also convert the non-offloaded partitions into HTAP tables and replicate them to object storage.
Note
Tables with sequences or foreign key constraints are not supported with tiered tables. Drop the sequence or constraint before converting to tiered table, then recreate it after conversion.
Enable Replication
Configure an existing heap table to stream data to object storage.
To enable replication:
- Find your table in the Tables tab. Your table must have a primary key.
- Select Actions then Enable Replication.
- Confirm the operation and select Enable Replication.
- The table Type changes from Heap to HTAP and Replication shows as Enabled.
- Compare the table values for Disk Rows, Object Storage Rows, Disk Size, and Object Storage Size.
To disable replication:
- Find your table in the Tables tab.
- Select Actions then Disable Replication.
- Confirm the operation and select Disable Replication.
- The table Type changes from HTAP to Heap and Replication shows as Disabled.
- Compare the table values for Disk Rows, Object Storage Rows, Disk Size, and Object Storage Size.
Use the pgaa.enable_analytics_replication function to enable continuos replication of a heap table. Note that your table must have a primary key.
CALL pgaa.enable_analytics_replication('public.transactional_table');
The function converts the heap table to an analytical HTAP table and starts the logical replication worker.
To disable replication:
CALL pgaa.disable_analytics_replication('public.transactional_table');
The function stops the analytics replication worker process from capturing and streaming any further DML changes to the external object storage. The table is no longer contributing data to the analytics platform, though the data that was already offloaded remains in object storage.
Offload Table
Move the contents of an existing HTAP table to "cold" storage, truncate the local copy, and stop replication.
To offload the table:
- Find your HTAP table in the Tables tab.
- Select Actions then Offload Table.
- Confirm the operation and select Offload Table.
- The table Type changes from HTAP to Iceberg or Delta and Replication shows as Disabled.
- Compare the table values for Disk Rows, Object Storage Rows, Disk Size, and Object Storage Size.
To restore the table:
- Find your Iceberg or Delta table in the Tables tab.
- Select Actions then Restore Table.
- Confirm the operation and select Restore Table.
- The table Type changes from Iceberg or Delta to Heap and Replication shows as Disabled.
- Compare the table values for Disk Rows, Object Storage Rows, Disk Size, and Object Storage Size.
Use the pgaa.convert_to_analytics function to convert a synchronized HTAP table into a "cold", pure-analytics table and remove the data from local disk.
CALL pgaa.convert_to_analytics('public.transactional_table');
The function removes the local data and redefines the table's access method to point solely to the external Iceberg or Delta files.
To restore the table use the pgaa.restore_from_analytics function to bring the data stored in the external analytics tier back into the Postgres transactional storage as a heap table.
CALL pgaa.restore_from_analytics('public.transactional_table');
The function checks the table's configuration, identifies its external Iceberg/Delta path in object storage, and rewrites the data back into the table's local storage structure. Once the data is locally present, the function sets the table's access method back to the local transactional endpoint again.
Other operations
Trigger compaction
Perform compaction to improve the performance and storage efficiency of your external analytics tables residing in object storage.
- Find your HTAP or Iceberg or Delta table in the Tables tab.
- Select Actions then Trigger Compaction.
- Confirm the operation and select Trigger Compaction.
- Observe the new value of Object Storage Size for the table.
Use the function pgaa.execute_compaction to trigger the compaction job.
CALL pgaa.execute_compaction ('public.transactional_table');
Read tables from an external catalog
Attach as many external catalogs as desired to access table data offloaded to object storage by other PGD clusters.
- Select your cluster and navigate to Catalogs, then select Add Catalog.
- Find the catalog you want to attach by typing the cluster ID of the origin PGD cluster, select the catalog, and select Add Managed Catalog.
- Locate the newly added catalog under Catalogs, select Actions, Import, and confirm.
- This operation imports the metadata of all the tables available in this catalog. You may select Sync to continously synchronize with the source catalog.
- Examine the newly imported external tables under the Tables section, listed as Iceberg tables.
- You may restore these tables to your local Postgres storage by selecting the Restore option.
Register the external catalog's connection details:
SELECT bdr.replicate_ddl_command($$ SELECT pgaa.add_catalog( 'read-catalog-name', 'iceberg-rest', '{ "url": "https://your-catalog-rest-endpoint.com", "token": "YOUR_READ_ONLY_TOKEN", }' ) );
Attach the catalog to the session:
SELECT pgaa.attach_catalog('read-catalog-name');
Once attached, find and query the tables in the external object storage. Use a three-part naming convention:
catalog_name.schema_name.table_name.SELECT * FROM pgaa.catalog_tables('read-catalog-name'); SELECT COUNT(*) FROM read-catalog-name.schema_name.table_name;