Configure PGFS for Delta Lake

This guide explains how to configure a PGFS (EDB Postgres File System) storage location in a Hybrid Manager (HM) Lakehouse cluster, to enable querying Delta Lake tables stored in object storage.

You must configure a PGFS location before creating PGAA external tables that point to Delta Lake tables.

For background on Delta Lake integration in HM, see: Delta Lake in Hybrid Manager

Prerequisites

Before configuring PGFS for Delta Lake:

  • You have provisioned a Lakehouse cluster in Hybrid Manager. If not yet done, see: Create a Lakehouse Cluster

  • You have existing Delta Lake tables stored in an S3-compatible object storage bucket. The bucket must contain both:

  • Parquet data files

  • _delta_log directory with Delta Lake transaction logs

  • You have access credentials for the bucket (if private).

  • You have a SQL client connected to your Lakehouse cluster:

  • psql

  • DBeaver

  • pgAdmin

  • Any Postgres-compatible client

Steps

Step 1: Create PGFS storage location

Run pgfs.create_storage_location() to register the object storage location.

You can create as many PGFS locations as needed, one per bucket or per logical group of Delta tables.

Example: Public bucket

SELECT pgfs.create_storage_location( name => 'my_public_delta_lake_store', url => 's3://my-public-delta-data/', options => '{"aws_skip_signature": "true"}', credentials => '{}' );

Example: Private bucket (AWS S3)

SELECT pgfs.create_storage_location( name => 'my_private_delta_lake_assets', url => 's3://my-private-delta-data/', options => '{}', credentials => '{"access_key_id": "YOUR_AWS_ACCESS_KEY_ID", "secret_access_key": "YOUR_AWS_SECRET_ACCESS_KEY"}' );

Step 2: Validate PGFS location

Run the following to list all PGFS storage locations:

SELECT * FROM pgfs.list_storage_locations();

Verify that your new location (such as my_private_delta_lake_assets) appears in the list.

Notes

  • PGFS storage locations can be used by any PGAA external table definition.

  • You can define multiple PGFS locations — useful if Delta Lake tables are distributed across buckets.

  • Storage locations can point to any S3-compatible object store:

  • AWS S3

  • Google Cloud Storage

  • Azure Storage

  • MinIO

  • Other compatible systems

  • If you are configuring PGFS for Tiered Tables, see: Configure PGFS for Tiered Tables

Next steps

Now that you have configured a PGFS storage location:


Could this page be better? Report a problem or suggest an addition!