Offload PGD data to Apache Iceberg
This guide explains how and when to offload PGD data to Apache Iceberg in Hybrid Manager (HM), and what this unlocks for your analytics architecture.
Why offload PGD data to Iceberg?
- Cost-efficient tiered storage: Move older PGD data ("cold data") to object storage to lower primary database storage costs.
- Faster PGD performance: Keep PGD operational tables smaller and faster for transactional workloads.
- Enable Lakehouse analytics: Query offloaded data in Iceberg format using scalable Lakehouse clusters (vectorized queries).
- Broader data integration: Make PGD data available to other tools (Spark, Trino, Flink) via open Iceberg format.
When to use this:
- When implementing Tiered Tables (primary use case)
- When building an enterprise lakehouse architecture with unified PGD + external analytics
- When archiving historical PGD data with ongoing query access needs
For foundational concepts, see Tiered Tables in Hybrid Manager.
How PGD offload works in Hybrid Manager
Offloading PGD data to Iceberg involves 3 layers:
Layer | Action |
---|---|
Storage | Define a PGFS storage location and/or Iceberg catalog (where offloaded data goes). |
Node group | Configure your PGD node group to use this storage target. |
Table level | Enable analytics replication for each PGD table you want to offload. |
Prerequisites
- An HM-managed PGD cluster.
- PGFS storage location defined in PGD (S3-compatible object store).
- (Optional, recommended) Iceberg REST catalog connection configured and attached.
- PGD node group configured for analytics offload:
- Configure PGD node group for analytics offload
Steps to offload PGD data
Step 1: Ensure node group is configured for offload (required)
- If using PGFS only: set
analytics_storage_location
. - If using catalog: set
analytics_write_catalog
.
Reference: Configure PGD node group for analytics offload
Step 2: Enable analytics replication on your table
To start offloading a specific PGD table:
ALTER TABLE public.operational_logs SET (pgd.replicate_to_analytics = TRUE);
Result:
- PGD will automatically replicate this table’s data to Iceberg:
- Immediately if full table offload is used.
- Partitioned if combined with Tiered Tables and BDR AutoPartition.
Accessing offloaded data
If using Iceberg catalog
Attach catalog on Lakehouse cluster:
SELECT pgaa.attach_catalog('your_catalog_alias');
Query offloaded table directly by catalog name.
If using PGFS only (filesystem offload)
Define PGAA reader table pointing to Iceberg path in object storage:
CREATE TABLE public.operational_logs_offloaded (...) USING PGAA WITH (pgaa.path = 's3://your/path', pgaa.format = 'iceberg');
Notes
- This mechanism powers Tiered Tables: older partitions offload automatically when using BDR AutoPartition with analytics_offload_period.
- You can enable/disable offload per table dynamically.
- Offloaded data remains queryable indefinitely in Iceberg.
What this unlocks
✅ Seamless tiered storage for large PGD tables ✅ Fast lakehouse analytics on historical PGD data ✅ Open format (Iceberg) interoperability ✅ Reduced PGD operational cost and footprint
Next steps
Now that your PGD data is offloading to Iceberg, you can:
- Monitor Tiered Tables status and storage savings
- Query Tiered Tables from PGD and Lakehouse
- Query existing Iceberg tables from Lakehouse
For broader architecture view, see Analytics in Hybrid Manager.
← Prev
Monitor Tiered Tables status and storage savings
↑ Up
How-To Guides for Analytics in Hybrid Manager
Next →
Performance tuning for Delta Lake queries
Could this page be better? Report a problem or suggest an addition!