Query existing Apache Iceberg tables

This guide explains how to query existing Apache Iceberg tables from a Hybrid Manager (HM) Lakehouse cluster.

You can connect Lakehouse clusters to Iceberg tables stored in object storage and run SQL queries using PGAA.

For background on Iceberg integration, see Apache Iceberg in Hybrid Manager.


Goals

After completing this guide, you will:

✅ Be able to define PGFS storage locations pointing to Iceberg data
✅ Be able to create PGAA reader tables that expose Iceberg tables to Postgres SQL
✅ Unlock SQL-based analytics and BI access to existing Iceberg tables
✅ Enable Postgres tools and users to query Iceberg data created by other systems


When to use this

You should follow this guide when:

  • You already have Iceberg tables written by Spark, Trino, Flink, PGD offload, or other systems
  • You want to query this data using Lakehouse clusters in Hybrid Manager
  • You want to connect BI tools or Postgres clients to your Iceberg data
  • You are testing performance or validating Lakehouse integration with existing Iceberg datasets

Prerequisites

  • An HM-provisioned EDB Postgres Lakehouse cluster
  • Existing Apache Iceberg tables in S3-compatible object storage
  • Appropriate storage permissions (if required)
  • Path to Iceberg table root, containing either version-hint.text or vX.metadata.json

Steps

1. Define a PGFS storage location

Connect to your Lakehouse cluster and define a PGFS storage location.

Example:

SELECT pgfs.create_storage_location(
name => 'my_s3_iceberg_data',
url => 's3://your-bucket/path/to/iceberg',
options => '{}',
credentials => '{"access_key_id": "...", "secret_access_key": "..."}'
);

2. Create PGAA external table for Iceberg

CREATE TABLE public.my_sales_iceberg_data (
    sales_id INT,
    sales_date DATE,
    sales_amount NUMERIC
)
USING PGAA
WITH (
    pgaa.storage_location = 'my_s3_iceberg_data',
    pgaa.path = 'sales_records/iceberg_table_root',
    pgaa.format = 'iceberg'
);

This exposes your Iceberg table to Postgres SQL:

CREATE TABLE public.my_sales_iceberg_data ()
USING PGAA
WITH (
pgaa.storage_location = 'my_s3_iceberg_data',
pgaa.path = 'sales_records/iceberg_table_root',
pgaa.format = 'iceberg'
);

Note: For Iceberg tables not managed by a catalog, pgaa.path typically points to a directory with a version-hint.text file or directly to a vX.metadata.json file.


3. Query the Iceberg table

Run queries using standard SQL.

Example:

SELECT * FROM public.my_sales_iceberg_data
WHERE sales_region = 'North America'
LIMIT 100;

Notes

  • Lakehouse cluster uses a vectorized query engine for fast Iceberg queries
  • You do not need a catalog connection to query file-based Iceberg tables
  • If your Iceberg tables are catalog-managed, see Configure an Iceberg REST catalog connection for an alternative flow

Next steps

Now that you have created an Iceberg external table and run queries:


For an architecture overview, see Analytics in Hybrid Manager.


Could this page be better? Report a problem or suggest an addition!