Configure PGFS for Delta Lake
This guide explains how to configure a PGFS (EDB Postgres File System) storage location in a Hybrid Manager (HM) Lakehouse cluster, to enable querying Delta Lake tables stored in object storage.
You must configure a PGFS location before creating PGAA external tables that point to Delta Lake tables.
For background on Delta Lake integration in HM, see: Delta Lake in Hybrid Manager
Prerequisites
Before configuring PGFS for Delta Lake:
You have provisioned a Lakehouse cluster in Hybrid Manager. If not yet done, see: Create a Lakehouse Cluster
You have existing Delta Lake tables stored in an S3-compatible object storage bucket. The bucket must contain both:
Parquet data files
_delta_log
directory with Delta Lake transaction logsYou have access credentials for the bucket (if private).
You have a SQL client connected to your Lakehouse cluster:
psql
DBeaver
pgAdmin
Any Postgres-compatible client
Steps
Step 1: Create PGFS storage location
Run pgfs.create_storage_location()
to register the object storage location.
You can create as many PGFS locations as needed, one per bucket or per logical group of Delta tables.
Example: Public bucket
SELECT pgfs.create_storage_location( name => 'my_public_delta_lake_store', url => 's3://my-public-delta-data/', options => '{"aws_skip_signature": "true"}', credentials => '{}' );
Example: Private bucket (AWS S3)
SELECT pgfs.create_storage_location( name => 'my_private_delta_lake_assets', url => 's3://my-private-delta-data/', options => '{}', credentials => '{"access_key_id": "YOUR_AWS_ACCESS_KEY_ID", "secret_access_key": "YOUR_AWS_SECRET_ACCESS_KEY"}' );
Step 2: Validate PGFS location
Run the following to list all PGFS storage locations:
SELECT * FROM pgfs.list_storage_locations();
Verify that your new location (such as my_private_delta_lake_assets
) appears in the list.
Notes
PGFS storage locations can be used by any PGAA external table definition.
You can define multiple PGFS locations — useful if Delta Lake tables are distributed across buckets.
Storage locations can point to any S3-compatible object store:
AWS S3
Google Cloud Storage
Azure Storage
MinIO
Other compatible systems
If you are configuring PGFS for Tiered Tables, see: Configure PGFS for Tiered Tables
Next steps
Now that you have configured a PGFS storage location:
You can query existing Delta Lake tables from your Lakehouse cluster using PGAA reader tables. Query Delta Lake Tables
You can tune query performance on large Delta Lake tables to maximize vectorized query efficiency. Performance Considerations for Delta Lake Queries
You can review Delta Lake concepts and architecture in Hybrid Manager to understand how this fits in your analytics stack. Delta Lake in Hybrid Manager
- On this page
- Prerequisites
- Steps
- Notes
- Next steps
← Prev
How-To Guides for Analytics in Hybrid Manager
↑ Up
How-To Guides for Analytics in Hybrid Manager
Next →
Configure an Iceberg REST catalog connection
Could this page be better? Report a problem or suggest an addition!