Apache Iceberg® in Hybrid Manager v1.3.5

Apache Iceberg® is an open table format for large analytical datasets stored in object storage.

Hub quick link: Analytics Hub

Hybrid Manager (HM) integrates Iceberg capabilities into EDB Postgres deployments, enabling:

  • Efficient querying of object storage via Lakehouse clusters
  • Structured data offloading from PGD clusters to Iceberg format
  • Centralized catalog management for Iceberg tables
  • Interoperability with external analytics engines (Spark, Trino, Flink, etc.)

Why use Iceberg with Hybrid Manager

Apache Iceberg® provides an open, reliable, and performant foundation for analytics on large datasets. Using it through Hybrid Manager allows you to:

  • Query Iceberg tables with high performance from Lakehouse clusters using vectorized execution.
  • Offload transactional PGD data into Iceberg format for cost-efficient tiered storage and long-term analytics.
  • Manage Iceberg catalogs centrally through HM or connect to external catalogs.
  • Share data seamlessly between Postgres and other tools (Spark, Trino, Flink) using the Iceberg format.

Key terms and architecture overview

When should I use Iceberg with Hybrid Manager?

Use Iceberg with Hybrid Manager when you want to:

  • Archive PGD data cost-effectively and still query it with Postgres or other tools.
  • Unify a data lakehouse architecture with HM-managed or external Iceberg catalog.
  • Enable ad-hoc analytics on large object storage datasets without ETL.
  • Implement Tiered Tables to manage large time-series datasets and storage lifecycle in PGD.
  • Integrate Postgres and external data processing tools using the Iceberg format.

Key capabilities of Iceberg in Hybrid Manager

Querying existing Iceberg tables

What: Run SQL queries on Iceberg tables already stored in object storage.

Why: Reuse data created by other tools (Spark, Trino, Flink, PGD offload) without ETL or duplication.

How: Use Lakehouse clusters to define PGAA external tables pointing to Iceberg data.

Where: Iceberg tables in S3-compatible object storage, either file-based or catalog-managed.

Iceberg catalog integration

What: Connect Lakehouse and PGD nodes to Iceberg catalogs (HM-managed or external).

Why: Manage Iceberg table metadata centrally and query with consistent semantics.

How: Use PGAA functions (pgaa.add_catalog, pgaa.attach_catalog) to connect and import catalogs.

Where: HM-managed Lakekeeper catalog, or third-party catalogs (Glue, Nessie, Polaris).

Note: When using Tiered Tables or PGD offload, it is strongly recommended to use a catalog-managed Iceberg table for interoperability and lifecycle management.

Offloading PGD data to Iceberg

What: Offload PGD transactional data into Iceberg format for analytical storage.

Why: Optimize storage costs and reduce PGD operational load while keeping data queryable.

How: Configure PGFS locations and/or Iceberg catalog, then enable analytics replication on PGD tables.

Where: Offloaded Iceberg tables in object storage, queryable by Lakehouse or other tools.

Use cases for Iceberg in Hybrid Manager

  • Archiving PGD data in Iceberg format while maintaining query access
  • Centralized data lakehouse architecture with HM-managed catalog
  • Ad-hoc analytics on large data volumes through Lakehouse clusters
  • Implementing tiered storage and lifecycle management for PGD datasets
  • Exposing PGD offloaded data to Spark, Trino, Flink pipelines using Iceberg format

Getting started with Iceberg in Hybrid Manager

To begin using Iceberg with Hybrid Manager:

  1. Provision a Lakehouse cluster.
  2. Configure an Iceberg catalog connection (HM-managed or external).
  3. Learn Tiered Tables concepts if integrating with PGD.
  4. Read Iceberg/Delta with or without a catalog.
  5. Use standard Postgres clients to run analytical queries.

Next topic

Delta Lake in Hybrid Manager