Delta Lake
Delta Lake is an open-source table format and transaction layer that enhances modern data lakes with ACID guarantees, schema enforcement, time travel, and scalable performance.
EDB enables first-class support for querying Delta Lake tables as part of the Analytics Accelerator — allowing Postgres SQL to operate on large-scale lakehouse data.
For details on how Delta Lake is used and managed within Hybrid Manager (HM), see Working with Delta Lake in Hybrid Manager.
What is Delta Lake
Delta Lake adds database-like reliability and consistency to object storage systems such as S3, GCS, and Azure Data Lake Storage.
Key characteristics:
- ACID transactions with strong data consistency
- Schema enforcement and schema evolution support
- Time travel for querying historical versions of data
- Batch and streaming support
- Open format built on Apache Parquet with a transaction log (
_delta_log
)
Related concept: Open table formats
Why Delta Lake matters for EDB analytics
Delta Lake support enables the Analytics Accelerator to:
- Query existing Delta Lake tables in place using Postgres SQL
- Support interoperable data architectures across Postgres, Spark, Trino, Presto, and more
- Eliminate unnecessary data duplication or movement between systems
- Provide governed, versioned data for analytical queries within Lakehouse clusters
Related concept: EDB Postgres Lakehouse
How EDB leverages Delta Lake
EDB Postgres Lakehouse clusters provide access to Delta Lake data through the PGAA extension:
- Define Delta Lake external tables using
CREATE TABLE ... USING PGAA WITH (pgaa.format = 'delta', ...)
- Efficiently query Parquet-backed Delta Lake tables via vectorized execution with Apache DataFusion
- Benefit from Delta Lake features including schema evolution and time travel
Current primary support: querying existing Delta Lake tables. PGD offload currently targets Iceberg; Delta read support enables integration with existing Delta-based data lakes.
Common use cases
Use case | Delta Lake + Analytics Accelerator |
---|---|
Business intelligence reporting | Query Delta Lake tables using Postgres SQL and BI tools |
Data science and machine learning | Access Delta tables for model training and feature engineering |
Data lake governance | Utilize Delta’s ACID guarantees with Lakehouse SQL access |
Cross-platform interoperability | Query Delta Lake data alongside Spark, Trino, and Postgres Lakehouse |
Role-based guidance
Learning paths
- Analytics Accelerator 101: Foundational concepts
- Analytics Accelerator 201: Practical application
- Analytics Accelerator 301: Advanced techniques and optimization
Related concepts
Next steps
For Hybrid Manager users
How-To guides
Explore more in the Analytics Accelerator learning guide.
Could this page be better? Report a problem or suggest an addition!