Apache Iceberg® in Hybrid Manager v1.3.5
Apache Iceberg® is an open table format for large analytical datasets stored in object storage.
Hub quick link: Analytics Hub
Hybrid Manager (HM) integrates Iceberg capabilities into EDB Postgres deployments, enabling:
- Efficient querying of object storage via Lakehouse clusters
- Structured data offloading from PGD clusters to Iceberg format
- Centralized catalog management for Iceberg tables
- Interoperability with external analytics engines (Spark, Trino, Flink, etc.)
Why use Iceberg with Hybrid Manager
Apache Iceberg® provides an open, reliable, and performant foundation for analytics on large datasets. Using it through Hybrid Manager allows you to:
- Query Iceberg tables with high performance from Lakehouse clusters using vectorized execution.
- Offload transactional PGD data into Iceberg format for cost-efficient tiered storage and long-term analytics.
- Manage Iceberg catalogs centrally through HM or connect to external catalogs.
- Share data seamlessly between Postgres and other tools (Spark, Trino, Flink) using the Iceberg format.
Key terms and architecture overview
When should I use Iceberg with Hybrid Manager?
Use Iceberg with Hybrid Manager when you want to:
- Archive PGD data cost-effectively and still query it with Postgres or other tools.
- Unify a data lakehouse architecture with HM-managed or external Iceberg catalog.
- Enable ad-hoc analytics on large object storage datasets without ETL.
- Implement Tiered Tables to manage large time-series datasets and storage lifecycle in PGD.
- Integrate Postgres and external data processing tools using the Iceberg format.
Key capabilities of Iceberg in Hybrid Manager
Querying existing Iceberg tables
What: Run SQL queries on Iceberg tables already stored in object storage.
Why: Reuse data created by other tools (Spark, Trino, Flink, PGD offload) without ETL or duplication.
How: Use Lakehouse clusters to define PGAA external tables pointing to Iceberg data.
Where: Iceberg tables in S3-compatible object storage, either file-based or catalog-managed.
Iceberg catalog integration
What: Connect Lakehouse and PGD nodes to Iceberg catalogs (HM-managed or external).
Why: Manage Iceberg table metadata centrally and query with consistent semantics.
How: Use PGAA functions (pgaa.add_catalog, pgaa.attach_catalog) to connect and import catalogs.
Where: HM-managed Lakekeeper catalog, or third-party catalogs (Glue, Nessie, Polaris).
Note: When using Tiered Tables or PGD offload, it is strongly recommended to use a catalog-managed Iceberg table for interoperability and lifecycle management.
Offloading PGD data to Iceberg
What: Offload PGD transactional data into Iceberg format for analytical storage.
Why: Optimize storage costs and reduce PGD operational load while keeping data queryable.
How: Configure PGFS locations and/or Iceberg catalog, then enable analytics replication on PGD tables.
Where: Offloaded Iceberg tables in object storage, queryable by Lakehouse or other tools.
Use cases for Iceberg in Hybrid Manager
- Archiving PGD data in Iceberg format while maintaining query access
- Centralized data lakehouse architecture with HM-managed catalog
- Ad-hoc analytics on large data volumes through Lakehouse clusters
- Implementing tiered storage and lifecycle management for PGD datasets
- Exposing PGD offloaded data to Spark, Trino, Flink pipelines using Iceberg format
Getting started with Iceberg in Hybrid Manager
To begin using Iceberg with Hybrid Manager:
- Provision a Lakehouse cluster.
- Configure an Iceberg catalog connection (HM-managed or external).
- Learn Tiered Tables concepts if integrating with PGD.
- Read Iceberg/Delta with or without a catalog.
- Use standard Postgres clients to run analytical queries.