Delta Lake

Delta Lake is an open-source table format and transaction layer that enhances modern data lakes with ACID guarantees, schema enforcement, time travel, and scalable performance.

EDB enables first-class support for querying Delta Lake tables as part of the Analytics Accelerator — allowing Postgres SQL to operate on large-scale lakehouse data.

For details on how Delta Lake is used and managed within Hybrid Manager (HM), see Working with Delta Lake in Hybrid Manager.

What is Delta Lake

Delta Lake adds database-like reliability and consistency to object storage systems such as S3, GCS, and Azure Data Lake Storage.

Key characteristics:

  • ACID transactions with strong data consistency
  • Schema enforcement and schema evolution support
  • Time travel for querying historical versions of data
  • Batch and streaming support
  • Open format built on Apache Parquet with a transaction log (_delta_log)

Related concept: Open table formats

Why Delta Lake matters for EDB analytics

Delta Lake support enables the Analytics Accelerator to:

  • Query existing Delta Lake tables in place using Postgres SQL
  • Support interoperable data architectures across Postgres, Spark, Trino, Presto, and more
  • Eliminate unnecessary data duplication or movement between systems
  • Provide governed, versioned data for analytical queries within Lakehouse clusters

Related concept: EDB Postgres Lakehouse

How EDB leverages Delta Lake

EDB Postgres Lakehouse clusters provide access to Delta Lake data through the PGAA extension:

  • Define Delta Lake external tables using CREATE TABLE ... USING PGAA WITH (pgaa.format = 'delta', ...)
  • Efficiently query Parquet-backed Delta Lake tables via vectorized execution with Apache DataFusion
  • Benefit from Delta Lake features including schema evolution and time travel

Current primary support: querying existing Delta Lake tables. PGD offload currently targets Iceberg; Delta read support enables integration with existing Delta-based data lakes.

Common use cases

Use caseDelta Lake + Analytics Accelerator
Business intelligence reportingQuery Delta Lake tables using Postgres SQL and BI tools
Data science and machine learningAccess Delta tables for model training and feature engineering
Data lake governanceUtilize Delta’s ACID guarantees with Lakehouse SQL access
Cross-platform interoperabilityQuery Delta Lake data alongside Spark, Trino, and Postgres Lakehouse

Role-based guidance

Learning paths

Next steps

For Hybrid Manager users

How-To guides

Explore more in the Analytics Accelerator learning guide.


Could this page be better? Report a problem or suggest an addition!