Analytics Terminology
This page defines key terms used in the Analytics Accelerator and Hybrid Manager (HM) analytics features.
Use it as a quick reference when exploring concepts, how-tos, and feature guides.
Apache Iceberg
An open table format for large analytic datasets stored in object storage. Provides schema evolution, time travel, and interoperability with many analytics engines (Spark, Trino, Flink, and Postgres).
Delta Lake
An open table format and storage layer that adds ACID transactions and reliability to data lakes. Built on Parquet files with a _delta_log transaction log.
Data Lakehouse
A modern architecture that combines elements of data lakes and data warehouses:
- Stores data in object storage
- Supports open table formats (Iceberg, Delta Lake)
- Provides fast SQL analytics using vectorized query engines
- Decouples storage from compute
Related concept: Data Lakehouse
EDB Postgres Lakehouse Cluster
A managed analytical compute cluster provisioned by Hybrid Manager to run fast SQL queries on data stored in object storage.
- Supports Iceberg and Delta Lake formats
- Uses a vectorized query engine (Apache DataFusion)
- Separates storage and compute for scalability and cost efficiency
EDB Postgres Distributed (PGD)
An advanced, distributed version of Postgres:
- Provides high availability and multi-master replication
- Enables advanced data tiering patterns (Tiered Tables)
- Works seamlessly with Lakehouse clusters for scalable analytics
Tiered Tables
A feature that uses PGD AutoPartition to manage large time-based datasets:
- Hot data stays in the PGD transactional cluster
- Cold data is offloaded to object storage as Apache Iceberg tables
- Queries can transparently access both hot and cold data
PGAA (Postgres Generic Analytics Adapter)
An extension that enables Postgres to:
- Query open table formats (Iceberg, Delta Lake)
- Define external tables that map to object storage
- Power Lakehouse cluster queries
- Enable PGD Tiered Table offloading and querying
PGFS (Postgres File System)
An extension that defines storage locations Postgres can use to access object storage.
- Used by PGAA and PGD for Iceberg and Delta Lake access
- Supports AWS S3, Google Cloud Storage, and compatible services
- Must be configured on each Lakehouse or PGD cluster that needs to query object storage
AutoPartition
A capability of PGD BDR (Bi-Directional Replication):
- Automatically creates time-based partitions for a PGD table
- Automatically offloads older partitions to object storage when configured with an
analytics_offload_period
BDR Analytics Table
An internal PGD concept:
- The analytics_table view tracks which PGD tables are marked for Tiered Table offload
- Tracks the state of each table and offload progress
Useful for monitoring and validating Tiered Table behavior.
Iceberg Catalog
A metadata service that tracks Iceberg table schemas, locations, and versions:
- Required for full interoperability across multiple engines (Spark, Trino, Postgres)
- Can be HM-managed (Lakekeeper) or external (AWS Glue, Nessie, Polaris, etc.)
- PGAA and PGD both support connecting to an Iceberg REST catalog
Lakekeeper
An HM-managed Iceberg catalog service:
- Provides a central catalog for Iceberg tables
- Supports PGD Tiered Table offload with catalog integration
- Supports Lakehouse cluster queries through catalog-based table discovery
- Uses Iceberg REST Catalog API
Open Table Formats
Standardized file layouts and metadata formats used for analytics:
- Apache Iceberg
- Delta Lake
Enable multi-engine access to the same data (Spark, Trino, Postgres Lakehouse, etc.).
Vectorized Query Engine
An analytics engine that processes data in columnar batches (instead of row-by-row):
- Accelerates analytics queries
- Makes use of modern CPU features (SIMD)
- Powers Lakehouse cluster query performance
EDB Lakehouse clusters embed Apache DataFusion for this.
Data Tiering
The practice of storing hot, warm, and cold data on different storage tiers:
- Hot → fast transactional PGD nodes
- Cold → object storage (Iceberg tables)
- Tiered Tables automate this pattern in Postgres + HM
Related concept: Data Tiering
Separation of Storage and Compute
A pattern used by Lakehouse architectures:
- Store data in object storage (independently scaled)
- Scale analytical compute (Lakehouse clusters) independently of storage
Enables cost-efficient, elastic analytics architectures.
Related concept: Separation of Storage and Compute
Next steps
Now that you understand key terminology:
- On this page
- Apache Iceberg
- Delta Lake
- Data Lakehouse
- EDB Postgres Lakehouse Cluster
- EDB Postgres Distributed (PGD)
- Tiered Tables
- PGAA (Postgres Generic Analytics Adapter)
- PGFS (Postgres File System)
- AutoPartition
- BDR Analytics Table
- Iceberg Catalog
- Lakekeeper
- Open Table Formats
- Vectorized Query Engine
- Data Tiering
- Separation of Storage and Compute
- Next steps
← Prev
Analytics Accelerator generic concepts
↑ Up
Analytics Accelerator concepts and technologies
Next →
Analytics Accelerator learning paths
Could this page be better? Report a problem or suggest an addition!