Query existing Apache Iceberg tables
This guide explains how to query existing Apache Iceberg tables from a Hybrid Manager (HM) Lakehouse cluster.
You can connect Lakehouse clusters to Iceberg tables stored in object storage and run SQL queries using PGAA.
For background on Iceberg integration, see Apache Iceberg in Hybrid Manager.
Goals
After completing this guide, you will:
✅ Be able to define PGFS storage locations pointing to Iceberg data
✅ Be able to create PGAA reader tables that expose Iceberg tables to Postgres SQL
✅ Unlock SQL-based analytics and BI access to existing Iceberg tables
✅ Enable Postgres tools and users to query Iceberg data created by other systems
When to use this
You should follow this guide when:
- You already have Iceberg tables written by Spark, Trino, Flink, PGD offload, or other systems
- You want to query this data using Lakehouse clusters in Hybrid Manager
- You want to connect BI tools or Postgres clients to your Iceberg data
- You are testing performance or validating Lakehouse integration with existing Iceberg datasets
Prerequisites
- An HM-provisioned EDB Postgres Lakehouse cluster
- Existing Apache Iceberg tables in S3-compatible object storage
- Appropriate storage permissions (if required)
- Path to Iceberg table root, containing either version-hint.text or vX.metadata.json
Steps
1. Define a PGFS storage location
Connect to your Lakehouse cluster and define a PGFS storage location.
Example:
SELECT pgfs.create_storage_location( name => 'my_s3_iceberg_data', url => 's3://your-bucket/path/to/iceberg', options => '{}', credentials => '{"access_key_id": "...", "secret_access_key": "..."}' );
2. Create PGAA external table for Iceberg
CREATE TABLE public.my_sales_iceberg_data ( sales_id INT, sales_date DATE, sales_amount NUMERIC ) USING PGAA WITH ( pgaa.storage_location = 'my_s3_iceberg_data', pgaa.path = 'sales_records/iceberg_table_root', pgaa.format = 'iceberg' );
This exposes your Iceberg table to Postgres SQL:
CREATE TABLE public.my_sales_iceberg_data () USING PGAA WITH ( pgaa.storage_location = 'my_s3_iceberg_data', pgaa.path = 'sales_records/iceberg_table_root', pgaa.format = 'iceberg' );
Note: For Iceberg tables not managed by a catalog, pgaa.path typically points to a directory with a version-hint.text file or directly to a vX.metadata.json file.
3. Query the Iceberg table
Run queries using standard SQL.
Example:
SELECT * FROM public.my_sales_iceberg_data WHERE sales_region = 'North America' LIMIT 100;
Notes
- Lakehouse cluster uses a vectorized query engine for fast Iceberg queries
- You do not need a catalog connection to query file-based Iceberg tables
- If your Iceberg tables are catalog-managed, see Configure an Iceberg REST catalog connection for an alternative flow
Next steps
Now that you have created an Iceberg external table and run queries:
- You can Configure an Iceberg REST catalog connection to query catalog-managed tables
- You can Offload PGD data to Apache Iceberg to add more data to your Iceberg lakehouse
- You can Query Tiered Tables from PGD and Lakehouse to integrate with operational data
For an architecture overview, see Analytics in Hybrid Manager.
← Prev
Query Delta Lake tables
↑ Up
How-To Guides for Analytics in Hybrid Manager
Next →
Query Tiered Tables from PGD and Lakehouse
Could this page be better? Report a problem or suggest an addition!