Apache Iceberg in Hybrid Manager
Apache Iceberg is an open table format for large analytical datasets stored in object storage.
Hybrid Manager (HM) integrates Iceberg capabilities into EDB Postgres deployments, enabling:
- Efficient querying of object storage via Lakehouse clusters
- Structured data offloading from PGD clusters to Iceberg format
- Centralized catalog management for Iceberg tables
- Interoperability with external analytics engines (Spark, Trino, Flink, etc.)
For a general overview of Iceberg, see Apache Iceberg with EDB Solutions.
Why use Iceberg with Hybrid Manager
Apache Iceberg provides an open, reliable, and performant foundation for analytics on large datasets. Using it through Hybrid Manager allows you to:
- Query Iceberg tables with high performance from Lakehouse clusters using vectorized execution.
- Offload transactional PGD data into Iceberg format for cost-efficient tiered storage and long-term analytics.
- Manage Iceberg catalogs centrally through HM or connect to external catalogs.
- Share data seamlessly between Postgres and other tools (Spark, Trino, Flink) using the Iceberg format.
Key terms and architecture overview
For definitions of core analytics terms used in Hybrid Manager—such as PGFS, PGAA, Lakehouse Cluster, Catalog, AutoPartition, and Analytics Offload—see Analytics Concepts in Hybrid Manager.
When should I use Iceberg with Hybrid Manager?
Use Iceberg with Hybrid Manager when you want to:
- Archive PGD data cost-effectively and still query it with Postgres or other tools.
- Unify a data lakehouse architecture with HM-managed or external Iceberg catalog.
- Enable ad-hoc analytics on large object storage datasets without ETL.
- Implement Tiered Tables to manage large time-series datasets and storage lifecycle in PGD.
- Integrate Postgres and external data processing tools using the Iceberg format.
Key capabilities of Iceberg in Hybrid Manager
Querying existing Iceberg tables
What: Run SQL queries on Iceberg tables already stored in object storage.
Why: Reuse data created by other tools (Spark, Trino, Flink, PGD offload) without ETL or duplication.
How: Use Lakehouse clusters to define PGAA external tables pointing to Iceberg data.
Where: Iceberg tables in S3-compatible object storage, either file-based or catalog-managed.
How-To: Query existing Iceberg tables
Iceberg catalog integration
What: Connect Lakehouse and PGD nodes to Iceberg catalogs (HM-managed or external).
Why: Manage Iceberg table metadata centrally and query with consistent semantics.
How: Use PGAA functions (pgaa.add_catalog
, pgaa.attach_catalog
) to connect and import catalogs.
Where: HM-managed Lakekeeper catalog, or third-party catalogs (Glue, Nessie, Polaris).
How-To: Configure an Iceberg REST catalog connection
Note: When using Tiered Tables or PGD offload, it is strongly recommended to use a catalog-managed Iceberg table for interoperability and lifecycle management.
Offloading PGD data to Iceberg
What: Offload PGD transactional data into Iceberg format for analytical storage.
Why: Optimize storage costs and reduce PGD operational load while keeping data queryable.
How: Configure PGFS locations and/or Iceberg catalog, then enable analytics replication on PGD tables.
Where: Offloaded Iceberg tables in object storage, queryable by Lakehouse or other tools.
How-To: Offload PGD data to Apache Iceberg
Use cases for Iceberg in Hybrid Manager
- Archiving PGD data in Iceberg format while maintaining query access
- Centralized data lakehouse architecture with HM-managed catalog
- Ad-hoc analytics on large data volumes through Lakehouse clusters
- Implementing tiered storage and lifecycle management for PGD datasets
- Exposing PGD offloaded data to Spark, Trino, Flink pipelines using Iceberg format
How-To: Iceberg use cases in Hybrid Manager
Getting started with Iceberg in Hybrid Manager
To begin using Iceberg with Hybrid Manager:
- Provision a Lakehouse Cluster.
- Configure an Iceberg catalog (HM-managed or external).
- If offloading PGD data, Offload PGD data to Apache Iceberg.
- If querying existing Iceberg tables, Query existing Iceberg tables.
- Use standard Postgres clients to run analytical queries.
Related How-Tos
- Configure an Iceberg REST catalog connection
- Query existing Iceberg tables
- Offload PGD data to Apache Iceberg
Next topic
← Prev
Delta Lake in Hybrid Manager
↑ Up
Analytics in Hybrid Manager
Next →
EDB Postgres Lakehouse Clusters on Hybrid Manager
Could this page be better? Report a problem or suggest an addition!