Delta Lake

Suggest edits

Delta Lake is an open-source table format and transaction layer that enhances modern data lakes with ACID guarantees, schema enforcement, time travel, and scalable performance.

EDB enables first-class support for querying Delta Lake tables as part of the Analytics Accelerator — allowing Postgres SQL to operate on large-scale lakehouse data.

For details on how Delta Lake is used and managed within Hybrid Manager (HM), see Working with Delta Lake in Hybrid Manager.

What is Delta Lake

Delta Lake adds database-like reliability and consistency to object storage systems such as S3, GCS, and Azure Data Lake Storage.

Key characteristics:

ACID transactions with strong data consistency
Schema enforcement and schema evolution support
Time travel for querying historical versions of data
Batch and streaming support
Open format built on Apache Parquet with a transaction log (_delta_log)

Related concept: Open table formats

Why Delta Lake matters for EDB analytics

Delta Lake support enables the Analytics Accelerator to:

Query existing Delta Lake tables in place using Postgres SQL
Support interoperable data architectures across Postgres, Spark, Trino, Presto, and more
Eliminate unnecessary data duplication or movement between systems
Provide governed, versioned data for analytical queries within Lakehouse clusters

Related concept: EDB Postgres Lakehouse

How EDB leverages Delta Lake

EDB Postgres Lakehouse clusters provide access to Delta Lake data through the PGAA extension:

Define Delta Lake external tables using CREATE TABLE ... USING PGAA WITH (pgaa.format = 'delta', ...)
Efficiently query Parquet-backed Delta Lake tables via vectorized execution with Apache DataFusion
Benefit from Delta Lake features including schema evolution and time travel

Current primary support: querying existing Delta Lake tables. PGD offload currently targets Iceberg; Delta read support enables integration with existing Delta-based data lakes.

Common use cases

Use case	Delta Lake + Analytics Accelerator
Business intelligence reporting	Query Delta Lake tables using Postgres SQL and BI tools
Data science and machine learning	Access Delta tables for model training and feature engineering
Data lake governance	Utilize Delta’s ACID guarantees with Lakehouse SQL access
Cross-platform interoperability	Query Delta Lake data alongside Spark, Trino, and Postgres Lakehouse