Apache Iceberg in Hybrid Manager

Apache Iceberg is an open table format for large analytical datasets stored in object storage.

Hybrid Manager (HM) integrates Iceberg capabilities into EDB Postgres deployments, enabling:

Efficient querying of object storage via Lakehouse clusters
Structured data offloading from PGD clusters to Iceberg format
Centralized catalog management for Iceberg tables
Interoperability with external analytics engines (Spark, Trino, Flink, etc.)

For a general overview of Iceberg, see Apache Iceberg with EDB Solutions.

Why use Iceberg with Hybrid Manager

Apache Iceberg provides an open, reliable, and performant foundation for analytics on large datasets. Using it through Hybrid Manager allows you to:

Query Iceberg tables with high performance from Lakehouse clusters using vectorized execution.
Offload transactional PGD data into Iceberg format for cost-efficient tiered storage and long-term analytics.
Manage Iceberg catalogs centrally through HM or connect to external catalogs.
Share data seamlessly between Postgres and other tools (Spark, Trino, Flink) using the Iceberg format.

Key terms and architecture overview

For definitions of core analytics terms used in Hybrid Manager—such as PGFS, PGAA, Lakehouse Cluster, Catalog, AutoPartition, and Analytics Offload—see Analytics Concepts in Hybrid Manager.

When should I use Iceberg with Hybrid Manager?

Use Iceberg with Hybrid Manager when you want to:

Archive PGD data cost-effectively and still query it with Postgres or other tools.
Unify a data lakehouse architecture with HM-managed or external Iceberg catalog.
Enable ad-hoc analytics on large object storage datasets without ETL.
Implement Tiered Tables to manage large time-series datasets and storage lifecycle in PGD.
Integrate Postgres and external data processing tools using the Iceberg format.

Key capabilities of Iceberg in Hybrid Manager

Querying existing Iceberg tables

What: Run SQL queries on Iceberg tables already stored in object storage.

Why: Reuse data created by other tools (Spark, Trino, Flink, PGD offload) without ETL or duplication.

How: Use Lakehouse clusters to define PGAA external tables pointing to Iceberg data.

Where: Iceberg tables in S3-compatible object storage, either file-based or catalog-managed.

How-To: Query existing Iceberg tables

Iceberg catalog integration

What: Connect Lakehouse and PGD nodes to Iceberg catalogs (HM-managed or external).

Why: Manage Iceberg table metadata centrally and query with consistent semantics.

How: Use PGAA functions (pgaa.add_catalog, pgaa.attach_catalog) to connect and import catalogs.

Where: HM-managed Lakekeeper catalog, or third-party catalogs (Glue, Nessie, Polaris).

How-To: Configure an Iceberg REST catalog connection

Note: When using Tiered Tables or PGD offload, it is strongly recommended to use a catalog-managed Iceberg table for interoperability and lifecycle management.

Offloading PGD data to Iceberg

What: Offload PGD transactional data into Iceberg format for analytical storage.

Why: Optimize storage costs and reduce PGD operational load while keeping data queryable.

How: Configure PGFS locations and/or Iceberg catalog, then enable analytics replication on PGD tables.

Where: Offloaded Iceberg tables in object storage, queryable by Lakehouse or other tools.

How-To: Offload PGD data to Apache Iceberg

Use cases for Iceberg in Hybrid Manager

Archiving PGD data in Iceberg format while maintaining query access
Centralized data lakehouse architecture with HM-managed catalog
Ad-hoc analytics on large data volumes through Lakehouse clusters
Implementing tiered storage and lifecycle management for PGD datasets
Exposing PGD offloaded data to Spark, Trino, Flink pipelines using Iceberg format

How-To: Iceberg use cases in Hybrid Manager