Offload PGD data to Apache Iceberg

This guide explains how and when to offload PGD data to Apache Iceberg in Hybrid Manager (HM), and what this unlocks for your analytics architecture.

Why offload PGD data to Iceberg?

  • Cost-efficient tiered storage: Move older PGD data ("cold data") to object storage to lower primary database storage costs.
  • Faster PGD performance: Keep PGD operational tables smaller and faster for transactional workloads.
  • Enable Lakehouse analytics: Query offloaded data in Iceberg format using scalable Lakehouse clusters (vectorized queries).
  • Broader data integration: Make PGD data available to other tools (Spark, Trino, Flink) via open Iceberg format.

When to use this:

  • When implementing Tiered Tables (primary use case)
  • When building an enterprise lakehouse architecture with unified PGD + external analytics
  • When archiving historical PGD data with ongoing query access needs

For foundational concepts, see Tiered Tables in Hybrid Manager.


How PGD offload works in Hybrid Manager

Offloading PGD data to Iceberg involves 3 layers:

LayerAction
StorageDefine a PGFS storage location and/or Iceberg catalog (where offloaded data goes).
Node groupConfigure your PGD node group to use this storage target.
Table levelEnable analytics replication for each PGD table you want to offload.

Prerequisites


Steps to offload PGD data

Step 1: Ensure node group is configured for offload (required)

  • If using PGFS only: set analytics_storage_location.
  • If using catalog: set analytics_write_catalog.

Reference: Configure PGD node group for analytics offload


Step 2: Enable analytics replication on your table

To start offloading a specific PGD table:

ALTER TABLE public.operational_logs SET (pgd.replicate_to_analytics = TRUE);

Result:

  • PGD will automatically replicate this table’s data to Iceberg:
  • Immediately if full table offload is used.
  • Partitioned if combined with Tiered Tables and BDR AutoPartition.

Accessing offloaded data

If using Iceberg catalog

  • Attach catalog on Lakehouse cluster:

    SELECT pgaa.attach_catalog('your_catalog_alias');
  • Query offloaded table directly by catalog name.

If using PGFS only (filesystem offload)

  • Define PGAA reader table pointing to Iceberg path in object storage:

    CREATE TABLE public.operational_logs_offloaded (...) USING PGAA WITH (pgaa.path = 's3://your/path', pgaa.format = 'iceberg');

Notes

  • This mechanism powers Tiered Tables: older partitions offload automatically when using BDR AutoPartition with analytics_offload_period.
  • You can enable/disable offload per table dynamically.
  • Offloaded data remains queryable indefinitely in Iceberg.

What this unlocks

Seamless tiered storage for large PGD tablesFast lakehouse analytics on historical PGD dataOpen format (Iceberg) interoperabilityReduced PGD operational cost and footprint


Next steps

Now that your PGD data is offloading to Iceberg, you can:


For broader architecture view, see Analytics in Hybrid Manager.


Could this page be better? Report a problem or suggest an addition!