Delta Lake in Hybrid Manager

Delta Lake is an open-source table format that brings ACID transactions and reliability to data lakes.

Hybrid Manager (HM) integrates Delta Lake capabilities into EDB Postgres deployments, enabling Lakehouse clusters to query Delta Lake tables stored in object storage.

For a general overview of Delta Lake, see Understanding Delta Lake with EDB Solutions.

Why use Delta Lake with Hybrid Manager

Query existing Delta Lakes: Leverage data lakes already built on Delta Lake format with fast Postgres SQL.
Simplify analytics pipelines: Avoid unnecessary ETL—query Delta Lake tables in place from Postgres.
Broader ecosystem integration: Connect HM-managed Postgres to Delta Lake data produced by Spark, Trino, Flink, and other tools.
Cost-effective lakehouse architecture: Store large datasets in object storage and query via Lakehouse clusters.

Key terms and architecture overview

For definitions of core analytics terms used in Hybrid Manager—such as PGFS, PGAA, Lakehouse Cluster, and Analytics Offload—see Analytics Concepts in Hybrid Manager.

When should I use Delta Lake in Hybrid Manager?

Use Delta Lake with Hybrid Manager when you want to:

Query existing data lakes built on Delta Lake format, without ETL or data duplication.
Integrate Postgres and data lake ecosystems—query Delta Lake tables from Postgres SQL clients.
Enable unified analytics across operational Postgres data and data lake data.
Support BI tools and ad-hoc queries on Delta Lake content using familiar Postgres tools.
Leverage Lakehouse Clusters for scalable, fast SQL on large Delta datasets.

Important: PGAA currently supports read-only queries on Delta Lake tables. Writing or updating Delta tables via PGAA is not supported.

Key capabilities of Delta Lake in Hybrid Manager

Querying existing Delta Lake tables

What: Run SQL queries on Delta Lake tables stored in object storage.

Why: Enable BI tools and Postgres users to query existing Delta Lake data without duplicating or moving it.

How: Define PGFS storage locations and PGAA external tables in Lakehouse clusters.

Where: S3-compatible object storage with Delta Lake format (_delta_log + Parquet files).

How-To: Query Delta Lake Tables

Simplifying Postgres + Delta Lake integration

What: Connect HM Lakehouse clusters to Delta Lake tables created by Spark or other tools.

Why: Build unified reporting and analytics across your operational and data lake systems.

How: Create PGFS storage locations and PGAA reader tables pointing to Delta Lake paths.

Where: Shared object storage locations used by Delta Lake pipelines.

How-To: Configure PGFS for Delta Lake

Supporting unified SQL-based analytics

What: Enable Postgres SQL queries over both operational data and Delta Lake data.

Why: Empower application developers, data scientists, and BI users to query data lake content without complex tooling.

How: Use PGAA reader tables in Lakehouse clusters; optionally join with Postgres data.

Where: Delta Lake tables in object storage + Postgres tables in Lakehouse cluster or via FDW/dblink.

How-To: Query Delta Lake Tables

Getting started with Delta Lake in Hybrid Manager

To begin using Delta Lake with Hybrid Manager:

Provision a Lakehouse Cluster.
Configure PGFS for Delta Lake pointing to your Delta Lake object storage.
Enable the pgaa extension on the Lakehouse Cluster.
Create PGAA reader tables for Delta Lake paths.
Query Delta Lake tables using standard Postgres clients.

Next topic

Tiered Tables in Hybrid Manager

← Prev

Analytics in Hybrid Manager

↑ Up