Analytics Terminology

This page defines key terms used in the Analytics Accelerator and Hybrid Manager (HM) analytics features.

Use it as a quick reference when exploring concepts, how-tos, and feature guides.

Apache Iceberg

An open table format for large analytic datasets stored in object storage. Provides schema evolution, time travel, and interoperability with many analytics engines (Spark, Trino, Flink, and Postgres).

Learn more

Delta Lake

An open table format and storage layer that adds ACID transactions and reliability to data lakes. Built on Parquet files with a _delta_log transaction log.

Learn more

Data Lakehouse

A modern architecture that combines elements of data lakes and data warehouses:

Stores data in object storage
Supports open table formats (Iceberg, Delta Lake)
Provides fast SQL analytics using vectorized query engines
Decouples storage from compute

Related concept: Data Lakehouse

EDB Postgres Lakehouse Cluster

A managed analytical compute cluster provisioned by Hybrid Manager to run fast SQL queries on data stored in object storage.

Supports Iceberg and Delta Lake formats
Uses a vectorized query engine (Apache DataFusion)
Separates storage and compute for scalability and cost efficiency

Learn more

EDB Postgres Distributed (PGD)

An advanced, distributed version of Postgres:

Provides high availability and multi-master replication
Enables advanced data tiering patterns (Tiered Tables)
Works seamlessly with Lakehouse clusters for scalable analytics

Learn more

Tiered Tables

A feature that uses PGD AutoPartition to manage large time-based datasets:

Hot data stays in the PGD transactional cluster
Cold data is offloaded to object storage as Apache Iceberg tables
Queries can transparently access both hot and cold data

Learn more

PGAA (Postgres Generic Analytics Adapter)

An extension that enables Postgres to:

Query open table formats (Iceberg, Delta Lake)
Define external tables that map to object storage
Power Lakehouse cluster queries
Enable PGD Tiered Table offloading and querying

Example in use

PGFS (Postgres File System)

An extension that defines storage locations Postgres can use to access object storage.

Used by PGAA and PGD for Iceberg and Delta Lake access
Supports AWS S3, Google Cloud Storage, and compatible services
Must be configured on each Lakehouse or PGD cluster that needs to query object storage

Example in use

AutoPartition

A capability of PGD BDR (Bi-Directional Replication):

Automatically creates time-based partitions for a PGD table
Automatically offloads older partitions to object storage when configured with an analytics_offload_period

Example in use

BDR Analytics Table

An internal PGD concept:

The analytics_table view tracks which PGD tables are marked for Tiered Table offload
Tracks the state of each table and offload progress

Useful for monitoring and validating Tiered Table behavior.

Example in use

Iceberg Catalog

A metadata service that tracks Iceberg table schemas, locations, and versions:

Required for full interoperability across multiple engines (Spark, Trino, Postgres)
Can be HM-managed (Lakekeeper) or external (AWS Glue, Nessie, Polaris, etc.)
PGAA and PGD both support connecting to an Iceberg REST catalog

Example in use

Lakekeeper

An HM-managed Iceberg catalog service:

Provides a central catalog for Iceberg tables
Supports PGD Tiered Table offload with catalog integration
Supports Lakehouse cluster queries through catalog-based table discovery
Uses Iceberg REST Catalog API

Example in use

Open Table Formats

Standardized file layouts and metadata formats used for analytics:

Apache Iceberg
Delta Lake

Enable multi-engine access to the same data (Spark, Trino, Postgres Lakehouse, etc.).

Learn more

Vectorized Query Engine

An analytics engine that processes data in columnar batches (instead of row-by-row):

Accelerates analytics queries
Makes use of modern CPU features (SIMD)
Powers Lakehouse cluster query performance

EDB Lakehouse clusters embed Apache DataFusion for this.

Data Tiering

The practice of storing hot, warm, and cold data on different storage tiers:

Hot → fast transactional PGD nodes
Cold → object storage (Iceberg tables)
Tiered Tables automate this pattern in Postgres + HM

Related concept: Data Tiering

Separation of Storage and Compute

A pattern used by Lakehouse architectures:

Store data in object storage (independently scaled)
Scale analytical compute (Lakehouse clusters) independently of storage

Enables cost-efficient, elastic analytics architectures.

Related concept: Separation of Storage and Compute

Next steps

Now that you understand key terminology:

← Prev

Analytics Accelerator generic concepts

↑ Up

Analytics Accelerator concepts and technologies

Analytics Accelerator learning paths

Could this page be better? Report a problem or suggest an addition!

Analytics Terminology

Apache Iceberg

Delta Lake

Data Lakehouse

EDB Postgres Lakehouse Cluster

EDB Postgres Distributed (PGD)

Tiered Tables

PGAA (Postgres Generic Analytics Adapter)

PGFS (Postgres File System)

AutoPartition

BDR Analytics Table

Iceberg Catalog

Lakekeeper

Open Table Formats

Vectorized Query Engine

Data Tiering

Separation of Storage and Compute

Next steps

← Prev

↑ Up

Next →