Analytics Concepts in Hybrid Manager

Hybrid Manager enables you to build modern data lakehouse and analytics solutions for EDB Postgres.

This page introduces key concepts as they are applied in Hybrid Manager — with links to detailed explanations in the Analytics Hub for background.


Architecture overview

Hybrid Manager integrates the following layers:

  • Lakehouse Clusters for scalable analytical compute
  • Object Storage for cost-efficient analytical data storage
  • PGD Clusters for transactional workloads and automated Tiered Tables offload
  • Catalog Services (HM-managed or external) for metadata and interoperability
  • PGAA and PGFS to bridge Postgres with the data lake

This architecture enables both Postgres-first and multi-engine data lakehouse designs.

For a general introduction, see What is a Postgres Lakehouse.


Lakehouse Clusters in Hybrid Manager

Lakehouse Clusters provide scalable analytical compute in Hybrid Manager:

  • Provisioned and managed through HM
  • Equipped with PGAA extensions for vectorized query execution
  • Designed to query data in object storage (Iceberg, Delta Lake)
  • Interoperable with external tools (Spark, Trino)

Lakehouse Clusters are central to analytics in Hybrid Manager.

Learn more: EDB Postgres Lakehouse: An Overview


Apache Iceberg in Hybrid Manager

Hybrid Manager integrates Iceberg as the primary table format for offloaded and analytical data:

  • PGD clusters offload data to Iceberg format for Tiered Tables
  • Lakehouse Clusters query Iceberg tables efficiently
  • Catalog services (Lakekeeper or external) manage Iceberg table metadata

Iceberg provides an open and reliable foundation for lakehouse architectures in HM.

Conceptual overview: Apache Iceberg with EDB Solutions


Delta Lake in Hybrid Manager

Delta Lake provides additional flexibility:

  • Lakehouse Clusters can query existing Delta Lake tables in object storage
  • Common in environments with Spark/Databricks pipelines
  • Delta Lake support in HM is currently read-only

Hybrid Manager enables Postgres-based analytics on Delta Lake data without ETL.

Conceptual overview: Understanding Delta Lake with EDB Solutions


Tiered Tables in Hybrid Manager

Tiered Tables provide automated lifecycle management of large time-series or historical datasets:

  • PGD clusters use BDR AutoPartition to manage partitions
  • Older partitions are offloaded automatically to object storage in Iceberg format
  • Lakehouse Clusters and PGD parent tables can query both hot and cold data

Tiered Tables are a key optimization pattern for combining transactional and analytical workloads in Hybrid Manager.

Conceptual overview: Understanding Tiered Tables with EDB Postgres


PGAA and PGFS in Hybrid Manager

PGAA (Postgres Analytical Appliance)

PGAA extensions provide:

  • Vectorized query execution
  • Integration with object storage via PGFS
  • Support for querying Iceberg and Delta Lake tables
  • Catalog integration

PGAA powers both Lakehouse Clusters and PGD offload pipelines.


PGFS (Postgres File System)

PGFS defines object storage locations:

  • Used by PGAA to read/write data in object storage
  • Supports S3-compatible storage, GCS, and others
  • Configured in both PGD clusters and Lakehouse Clusters

PGFS is a simple but critical building block for data lake integration.


How these concepts fit together in Hybrid Manager

LayerRole
PGD ClustersTransactional layer, source for Tiered Tables
Lakehouse ClustersAnalytical compute, queries across Iceberg/Delta and Postgres
Object StorageCost-efficient analytical data storage
Catalog ServicesCentralized metadata for Iceberg tables
PGAA + PGFSBridge between Postgres and object storage

Next topic

Solving Analytics Problems in Hybrid Manager


Could this page be better? Report a problem or suggest an addition!