Analytics Terminology

This page defines key terms used in the Analytics Accelerator and Hybrid Manager (HM) analytics features.

Use it as a quick reference when exploring concepts, how-tos, and feature guides.


Apache Iceberg

An open table format for large analytic datasets stored in object storage. Provides schema evolution, time travel, and interoperability with many analytics engines (Spark, Trino, Flink, and Postgres).

Learn more


Delta Lake

An open table format and storage layer that adds ACID transactions and reliability to data lakes. Built on Parquet files with a _delta_log transaction log.

Learn more


Data Lakehouse

A modern architecture that combines elements of data lakes and data warehouses:

  • Stores data in object storage
  • Supports open table formats (Iceberg, Delta Lake)
  • Provides fast SQL analytics using vectorized query engines
  • Decouples storage from compute

Related concept: Data Lakehouse


EDB Postgres Lakehouse Cluster

A managed analytical compute cluster provisioned by Hybrid Manager to run fast SQL queries on data stored in object storage.

  • Supports Iceberg and Delta Lake formats
  • Uses a vectorized query engine (Apache DataFusion)
  • Separates storage and compute for scalability and cost efficiency

Learn more


EDB Postgres Distributed (PGD)

An advanced, distributed version of Postgres:

  • Provides high availability and multi-master replication
  • Enables advanced data tiering patterns (Tiered Tables)
  • Works seamlessly with Lakehouse clusters for scalable analytics

Learn more


Tiered Tables

A feature that uses PGD AutoPartition to manage large time-based datasets:

  • Hot data stays in the PGD transactional cluster
  • Cold data is offloaded to object storage as Apache Iceberg tables
  • Queries can transparently access both hot and cold data

Learn more


PGAA (Postgres Generic Analytics Adapter)

An extension that enables Postgres to:

  • Query open table formats (Iceberg, Delta Lake)
  • Define external tables that map to object storage
  • Power Lakehouse cluster queries
  • Enable PGD Tiered Table offloading and querying

Example in use


PGFS (Postgres File System)

An extension that defines storage locations Postgres can use to access object storage.

  • Used by PGAA and PGD for Iceberg and Delta Lake access
  • Supports AWS S3, Google Cloud Storage, and compatible services
  • Must be configured on each Lakehouse or PGD cluster that needs to query object storage

Example in use


AutoPartition

A capability of PGD BDR (Bi-Directional Replication):

  • Automatically creates time-based partitions for a PGD table
  • Automatically offloads older partitions to object storage when configured with an analytics_offload_period

Example in use


BDR Analytics Table

An internal PGD concept:

  • The analytics_table view tracks which PGD tables are marked for Tiered Table offload
  • Tracks the state of each table and offload progress

Useful for monitoring and validating Tiered Table behavior.

Example in use


Iceberg Catalog

A metadata service that tracks Iceberg table schemas, locations, and versions:

  • Required for full interoperability across multiple engines (Spark, Trino, Postgres)
  • Can be HM-managed (Lakekeeper) or external (AWS Glue, Nessie, Polaris, etc.)
  • PGAA and PGD both support connecting to an Iceberg REST catalog

Example in use


Lakekeeper

An HM-managed Iceberg catalog service:

  • Provides a central catalog for Iceberg tables
  • Supports PGD Tiered Table offload with catalog integration
  • Supports Lakehouse cluster queries through catalog-based table discovery
  • Uses Iceberg REST Catalog API

Example in use


Open Table Formats

Standardized file layouts and metadata formats used for analytics:

  • Apache Iceberg
  • Delta Lake

Enable multi-engine access to the same data (Spark, Trino, Postgres Lakehouse, etc.).

Learn more


Vectorized Query Engine

An analytics engine that processes data in columnar batches (instead of row-by-row):

  • Accelerates analytics queries
  • Makes use of modern CPU features (SIMD)
  • Powers Lakehouse cluster query performance

EDB Lakehouse clusters embed Apache DataFusion for this.


Data Tiering

The practice of storing hot, warm, and cold data on different storage tiers:

  • Hot → fast transactional PGD nodes
  • Cold → object storage (Iceberg tables)
  • Tiered Tables automate this pattern in Postgres + HM

Related concept: Data Tiering


Separation of Storage and Compute

A pattern used by Lakehouse architectures:

  • Store data in object storage (independently scaled)
  • Scale analytical compute (Lakehouse clusters) independently of storage

Enables cost-efficient, elastic analytics architectures.

Related concept: Separation of Storage and Compute


Next steps

Now that you understand key terminology:


Could this page be better? Report a problem or suggest an addition!