EDB Docs - EDB Postgres Distributed (PGD) v6.4.0

Tiered storage in PGD is powered by the Postgres Analytics Accelerator (PGAA) extension. PGAA provides the vectorized query engine and cloud-native storage handlers that allow PGD to offload data partitions to object storage and query them in place.

Before you can configure tiered tables, replicate to analytics, or offload data, you must install PGAA on every node in your cluster and define the object storage destination. Configuring the storage target at the node-group level ensures that every node writes to and reads from the same location.

Prerequisites

Cluster: PGD version 6.1 or later with PGAA and PGFS extensions installed.
Storage locations: Local, S3, GCP, or Azure storage using Iceberg or Delta Lake formats.
Catalog: If using an external catalog service, only Iceberg REST catalogs are supported.
Permissions: The database user must have CREATE, ALTER, and EXECUTE privileges for the PGD and PGAA functions.

Note

The credentials associated with a storage location or a catalog service must have both read and write permissions for the destination bucket. You can verify permissions by running the following functions, which return NULL if successful.

For a storage location:

SELECT pgaa.test_storage_location ('my_storage_location', true);

For a catalog service:

SELECT pgaa.test_catalog('my_iceberg_catalog', test_writes := FALSE);

Downloading and installing the PGAA extension

Install the PGAA extension on every node in your PGD cluster.

Download the package from the EDB repository:

Debian/Ubuntu
CentOS/RHEL

export EDB_SUBSCRIPTION_TOKEN=<your-token>
export EDB_SUBSCRIPTION_PLAN=<your-subscription-plan>
curl -1sSLf "https://downloads.enterprisedb.com/$EDB_SUBSCRIPTION_TOKEN/$EDB_SUBSCRIPTION_PLAN/setup.deb.sh" | sudo -E bash
sudo apt-get install -y <postgresql-distribution>-pgaa

export EDB_SUBSCRIPTION_TOKEN=<your-token>
EDB_SUBSCRIPTION_PLAN=<your-subscription-plan>
curl -1sSLf "https://downloads.enterprisedb.com/$EDB_SUBSCRIPTION_TOKEN/$EDB_SUBSCRIPTION_PLAN/setup.rpm.sh" | sudo -E bash
sudo dnf install -y <postgresql-distribution>-pgaa

Where:

$EDB_SUBSCRIPTION_TOKEN is the token you received when you registered for the EDB subscription.
$EDB_SUBSCRIPTION_PLAN is either standard (for PGE or Community Postgres) or enterprise (for EPAS).
<postgresql-distribution> specifies your Postgres distribution (edb-postgresextended, edb-as, or postgres), and version.

Update your postgresql.conf file. Adding pgaa to shared_preload_libraries allows Postgres to manage the life cycle of the Seafowl query engine as a background worker.
```
shared_preload_libraries = 'pgaa'
pgaa.autostart_seafowl = on
```
Restart your Postgres service to load the new library.

Pointing to object storage

Define the destination for the cold data. Because you are working within a distributed PGD cluster, you must wrap these configuration calls in bdr.replicate_ddl_command to ensure the settings are propagated to all nodes in the group.

Tip

To view your node group, use SELECT node_group_id FROM bdr.node_group;.

Using a storage location
Using a catalog service

Warning

Local filesystem storage locations aren't supported in multi-node PGD deployments. On write leader failover, the new node can't access the previous node's filesystem and replication restarts from zero. Use S3, GCS, or Azure instead.

Create a storage location using the PGFS extension and set it as the analytics storage for your node group.

SELECT bdr.replicate_ddl_command($$
  SELECT pgfs.create_storage_location(
  'my-storage-location',
  'protocol://my-bucket-name/path/',
  '{"region": "region-name"}',
  '{"access_key_id": "...", "secret_access_key": "..."}'
  )
$$);

Clear analytics_write_catalog. Its value must be NULL before an analytics_storage_location can be assigned:
```
SELECT bdr.alter_node_group_option('my-pgd-group', 'analytics_write_catalog', NULL);
```

Set the analytics storage location for your node group:

SELECT bdr.alter_node_group_option('my-pgd-group', 'analytics_storage_location', 'my-storage-location');

Configure a connection to an external catalog:

SELECT bdr.replicate_ddl_command($$
  SELECT pgaa.add_catalog(
    'my-catalog-name',
    'iceberg-rest',
    '{
      "url": "https://my-catalog-rest-endpoint.com",
      "token": "MY_AUTH_TOKEN",
     }'
  )
);

Clear analytics_storage_location. Its value must be NULL before an analytics_write_catalog can be assigned:
```
SELECT bdr.alter_node_group_option('my-pgd-group', 'analytics_storage_location', NULL);
```

Set the catalog as the default analytics write location:

SELECT bdr.replicate_ddl_command($$  
  SELECT bdr.alter_node_group_option('my-pgd-group', 'analytics_write_catalog', 'my-catalog-name')  
);

Next steps

Once you have defined the storage target for your PGD cluster, determine how your tables interact with that storage based on your requirements for performance and capacity:

Implement tiered tables: Establish a zero-touch data lifecycle by automatically transitioning partitions from local heap storage to your analytics storage targed based on an age threshold.
Replicate to analytics: Maintain a local transactional heap table while simultaneously synchronizing a columnar copy to your storage target for heavy analytical processing.
Offload to analytics: Reclaim heap disk space from existing tables which are being replicated to analytics.

Setting up PGAA v6.4.0

Prerequisites

Note

Downloading and installing the PGAA extension

Pointing to object storage

Tip

Warning

Next steps

← Prev

↑ Up

Next →