Query existing Apache Iceberg tables

This guide explains how to query existing Apache Iceberg tables from a Hybrid Manager (HM) Lakehouse cluster.

You can connect Lakehouse clusters to Iceberg tables stored in object storage and run SQL queries using PGAA.

For background on Iceberg integration, see Apache Iceberg in Hybrid Manager.

Goals

After completing this guide, you will:

✅ Be able to define PGFS storage locations pointing to Iceberg data
✅ Be able to create PGAA reader tables that expose Iceberg tables to Postgres SQL
✅ Unlock SQL-based analytics and BI access to existing Iceberg tables
✅ Enable Postgres tools and users to query Iceberg data created by other systems

When to use this

You should follow this guide when:

You already have Iceberg tables written by Spark, Trino, Flink, PGD offload, or other systems
You want to query this data using Lakehouse clusters in Hybrid Manager
You want to connect BI tools or Postgres clients to your Iceberg data
You are testing performance or validating Lakehouse integration with existing Iceberg datasets

Prerequisites

An HM-provisioned EDB Postgres Lakehouse cluster
Existing Apache Iceberg tables in S3-compatible object storage
Appropriate storage permissions (if required)
Path to Iceberg table root, containing either version-hint.text or vX.metadata.json

Steps

1. Define a PGFS storage location

Connect to your Lakehouse cluster and define a PGFS storage location.

Example:

SELECT pgfs.create_storage_location(
name => 'my_s3_iceberg_data',
url => 's3://your-bucket/path/to/iceberg',
options => '{}',
credentials => '{"access_key_id": "...", "secret_access_key": "..."}'
);

2. Create PGAA external table for Iceberg

CREATE TABLE public.my_sales_iceberg_data (
    sales_id INT,
    sales_date DATE,
    sales_amount NUMERIC
)
USING PGAA
WITH (
    pgaa.storage_location = 'my_s3_iceberg_data',
    pgaa.path = 'sales_records/iceberg_table_root',
    pgaa.format = 'iceberg'
);

This exposes your Iceberg table to Postgres SQL:

CREATE TABLE public.my_sales_iceberg_data ()
USING PGAA
WITH (
pgaa.storage_location = 'my_s3_iceberg_data',
pgaa.path = 'sales_records/iceberg_table_root',
pgaa.format = 'iceberg'
);

Note: For Iceberg tables not managed by a catalog, pgaa.path typically points to a directory with a version-hint.text file or directly to a vX.metadata.json file.

3. Query the Iceberg table

Run queries using standard SQL.

Example:

SELECT * FROM public.my_sales_iceberg_data
WHERE sales_region = 'North America'
LIMIT 100;

Notes

Lakehouse cluster uses a vectorized query engine for fast Iceberg queries
You do not need a catalog connection to query file-based Iceberg tables
If your Iceberg tables are catalog-managed, see Configure an Iceberg REST catalog connection for an alternative flow

Next steps

Now that you have created an Iceberg external table and run queries:

You can Configure an Iceberg REST catalog connection to query catalog-managed tables
You can Offload PGD data to Apache Iceberg to add more data to your Iceberg lakehouse
You can Query Tiered Tables from PGD and Lakehouse to integrate with operational data

For an architecture overview, see Analytics in Hybrid Manager.

← Prev

Query Delta Lake tables

↑ Up

How-To Guides for Analytics in Hybrid Manager

Query Tiered Tables from PGD and Lakehouse

Could this page be better? Report a problem or suggest an addition!