Apache Iceberg® in Hybrid Manager Innovation Release

This documentation covers the current Innovation Release of EDB Postgres AI. See also:

Hybrid Manager dual release strategy
Documentation for the current Long-term support release

Apache Iceberg® is an open table format for large analytical datasets stored in object storage.

Hybrid Manager (HM) integrates Iceberg capabilities into EDB Postgres deployments, enabling:

Efficient querying of object storage via Lakehouse clusters
Structured data offloading from PGD clusters to Iceberg format
Centralized catalog management for Iceberg tables
Interoperability with external analytics engines (Spark, Trino, Flink, etc.)

Why use Iceberg with Hybrid Manager

Apache Iceberg® provides an open, reliable, and performant foundation for analytics on large datasets. Using it through Hybrid Manager allows you to:

Query Iceberg tables with high performance from Lakehouse clusters using vectorized execution.
Offload transactional PGD data into Iceberg format for cost-efficient tiered storage and long-term analytics.
Manage Iceberg catalogs centrally through HM or connect to external catalogs.
Share data seamlessly between Postgres and other tools (Spark, Trino, Flink) using the Iceberg format.

Key terms and architecture overview

When should I use Iceberg with Hybrid Manager?

Use Iceberg with Hybrid Manager when you want to:

Archive PGD data cost-effectively and still query it with Postgres or other tools.
Unify a data lakehouse architecture with HM-managed or external Iceberg catalog.
Enable ad-hoc analytics on large object storage datasets without ETL.
Implement Tiered Tables to manage large time-series datasets and storage lifecycle in PGD.
Integrate Postgres and external data processing tools using the Iceberg format.

Key capabilities of Iceberg in Hybrid Manager

Querying existing Iceberg tables

What: Run SQL queries on Iceberg tables already stored in object storage.

Why: Reuse data created by other tools (Spark, Trino, Flink, PGD offload) without ETL or duplication.

How: Use Lakehouse clusters to define PGAA external tables pointing to Iceberg data.

Where: Iceberg tables in S3-compatible object storage, either file-based or catalog-managed.

Iceberg catalog integration

What: Connect Lakehouse and PGD nodes to Iceberg catalogs (HM-managed or external).

Why: Manage Iceberg table metadata centrally and query with consistent semantics.

How: Use PGAA functions (pgaa.add_catalog, pgaa.attach_catalog) to connect and import catalogs.

Where: HM-managed Lakekeeper catalog, or third-party catalogs (Glue, Nessie, Polaris).

Note: When using Tiered Tables or PGD offload, it is strongly recommended to use a catalog-managed Iceberg table for interoperability and lifecycle management.

Offloading PGD data to Iceberg

What: Offload PGD transactional data into Iceberg format for analytical storage.

Why: Optimize storage costs and reduce PGD operational load while keeping data queryable.

How: Configure PGFS locations and/or Iceberg catalog, then enable analytics replication on PGD tables.

Where: Offloaded Iceberg tables in object storage, queryable by Lakehouse or other tools.

Use cases for Iceberg in Hybrid Manager

Archiving PGD data in Iceberg format while maintaining query access
Centralized data lakehouse architecture with HM-managed catalog
Ad-hoc analytics on large data volumes through Lakehouse clusters
Implementing tiered storage and lifecycle management for PGD datasets
Exposing PGD offloaded data to Spark, Trino, Flink pipelines using Iceberg format

Getting started with Iceberg in Hybrid Manager

To begin using Iceberg with Hybrid Manager:

Provision a Lakehouse cluster.
Configure an Iceberg catalog connection.
Learn Tiered Tables concepts if integrating with PGD.
Read Iceberg/Delta with or without a catalog.
Use standard Postgres clients to run analytical queries.

Next topic

Delta Lake in Hybrid Manager

← Prev