Analytics Concepts in Hybrid Manager
Hybrid Manager enables you to build modern data lakehouse and analytics solutions for EDB Postgres.
This page introduces key concepts as they are applied in Hybrid Manager — with links to detailed explanations in the Analytics Hub for background.
Architecture overview
Hybrid Manager integrates the following layers:
- Lakehouse Clusters for scalable analytical compute
- Object Storage for cost-efficient analytical data storage
- PGD Clusters for transactional workloads and automated Tiered Tables offload
- Catalog Services (HM-managed or external) for metadata and interoperability
- PGAA and PGFS to bridge Postgres with the data lake
This architecture enables both Postgres-first and multi-engine data lakehouse designs.
For a general introduction, see What is a Postgres Lakehouse.
Lakehouse Clusters in Hybrid Manager
Lakehouse Clusters provide scalable analytical compute in Hybrid Manager:
- Provisioned and managed through HM
- Equipped with PGAA extensions for vectorized query execution
- Designed to query data in object storage (Iceberg, Delta Lake)
- Interoperable with external tools (Spark, Trino)
Lakehouse Clusters are central to analytics in Hybrid Manager.
Learn more: EDB Postgres Lakehouse: An Overview
Apache Iceberg in Hybrid Manager
Hybrid Manager integrates Iceberg as the primary table format for offloaded and analytical data:
- PGD clusters offload data to Iceberg format for Tiered Tables
- Lakehouse Clusters query Iceberg tables efficiently
- Catalog services (Lakekeeper or external) manage Iceberg table metadata
Iceberg provides an open and reliable foundation for lakehouse architectures in HM.
Conceptual overview: Apache Iceberg with EDB Solutions
Delta Lake in Hybrid Manager
Delta Lake provides additional flexibility:
- Lakehouse Clusters can query existing Delta Lake tables in object storage
- Common in environments with Spark/Databricks pipelines
- Delta Lake support in HM is currently read-only
Hybrid Manager enables Postgres-based analytics on Delta Lake data without ETL.
Conceptual overview: Understanding Delta Lake with EDB Solutions
Tiered Tables in Hybrid Manager
Tiered Tables provide automated lifecycle management of large time-series or historical datasets:
- PGD clusters use BDR AutoPartition to manage partitions
- Older partitions are offloaded automatically to object storage in Iceberg format
- Lakehouse Clusters and PGD parent tables can query both hot and cold data
Tiered Tables are a key optimization pattern for combining transactional and analytical workloads in Hybrid Manager.
Conceptual overview: Understanding Tiered Tables with EDB Postgres
PGAA and PGFS in Hybrid Manager
PGAA (Postgres Analytical Appliance)
PGAA extensions provide:
- Vectorized query execution
- Integration with object storage via PGFS
- Support for querying Iceberg and Delta Lake tables
- Catalog integration
PGAA powers both Lakehouse Clusters and PGD offload pipelines.
PGFS (Postgres File System)
PGFS defines object storage locations:
- Used by PGAA to read/write data in object storage
- Supports S3-compatible storage, GCS, and others
- Configured in both PGD clusters and Lakehouse Clusters
PGFS is a simple but critical building block for data lake integration.
How these concepts fit together in Hybrid Manager
Layer | Role |
---|---|
PGD Clusters | Transactional layer, source for Tiered Tables |
Lakehouse Clusters | Analytical compute, queries across Iceberg/Delta and Postgres |
Object Storage | Cost-efficient analytical data storage |
Catalog Services | Centralized metadata for Iceberg tables |
PGAA + PGFS | Bridge between Postgres and object storage |
Next topic
← Prev
Learning Analytics in Hybrid Manager
↑ Up
Learning Analytics in Hybrid Manager
Next →
How-To Guides for Analytics in Hybrid Manager
Could this page be better? Report a problem or suggest an addition!