Setting up PGAA v6.3.1

Tiered storage in PGD is powered by the Postgres Analytics Accelerator (PGAA) extension. PGAA provides the vectorized query engine and cloud-native storage handlers that allow PGD to offload data partitions to object storage and query them in place.

Before you can configure tiered tables, replicate to analytics, or offload data, you must install PGAA on every node in your cluster and define the object storage destination. Configuring the storage target at the node-group level ensures that every node writes to and reads from the same location.

Prerequisites

  • Cluster: PGD version 6.1 or later with PGAA and PGFS extensions installed.
  • Storage locations: Local, S3, GCP, or Azure storage using Iceberg or Delta Lake formats.
  • Catalog: If using an external catalog service, only Iceberg REST catalogs are supported.
  • Permissions: The database user must have CREATE, ALTER, and EXECUTE privileges for the PGD and PGAA functions.
Note

The credentials associated with a storage location or a catalog service must have both read and write permissions for the destination bucket. You can verify permissions by running the following functions, which return NULL if successful.

  • For a storage location:
    SELECT pgaa.test_storage_location ('my_storage_location', true);
  • For a catalog service:
    SELECT pgaa.test_catalog('my_iceberg_catalog', test_writes := FALSE);

Downloading and installing the PGAA extension

Install the PGAA extension on every node in your PGD cluster.

  1. Download the package from the EDB repository:

    Where:

    • $EDB_SUBSCRIPTION_TOKEN is the token you received when you registered for the EDB subscription.
    • $EDB_SUBSCRIPTION_PLAN is either standard (for PGE or Community Postgres) or enterprise (for EPAS).
    • <postgresql-distribution> specifies your Postgres distribution (edb-postgresextended, edb-as, or postgres), and version.
  2. Update your postgresql.conf file. Adding pgaa to shared_preload_libraries allows Postgres to manage the life cycle of the Seafowl query engine as a background worker.

    shared_preload_libraries = 'pgaa'
    pgaa.autostart_seafowl = on
  3. Restart your Postgres service to load the new library.

Pointing to object storage

Define the destination for the cold data. Because you are working within a distributed PGD cluster, you must wrap these configuration calls in bdr.replicate_ddl_command to ensure the settings are propagated to all nodes in the group.

Tip

To view your node group, use SELECT node_group_id FROM bdr.node_group;.

Next steps

Once you have defined the storage target for your PGD cluster, determine how your tables interact with that storage based on your requirements for performance and capacity:

  • Implement tiered tables: Establish a zero-touch data lifecycle by automatically transitioning partitions from local heap storage to your analytics storage targed based on an age threshold.

  • Replicate to analytics: Maintain a local transactional heap table while simultaneously synchronizing a columnar copy to your storage target for heavy analytical processing.

  • Offload to analytics: Reclaim heap disk space from existing tables which are being replicated to analytics.