pg_stat_statements v7.4

pg_stat_statements is a core PostgreSQL module that provides a means to track planning and execution statistics of all SQL statements executed by the server.

In WarehousePG (WHPG) distributed architecture, query execution is parallelized across the coordinator and multiple segments. The WHPG implementation of this module provides cluster-wide visibility, collecting performance metrics from every segment node. This allows administrators to analyze buffer usage, I/O operations, and execution patterns across the entire distributed cluster rather than just the coordinator.

By extending standard functionality into the parallel execution layer, pg_stat_statements enables:

  • Comprehensive distributed tracking across all query executors.
  • Unified visibility through specialized system views for both raw and aggregated data.
  • Centralized management for resetting performance metrics across the entire cluster simultaneously.

Loading the module

pg_stat_statements is shipped with WarehousePG 7.4 and later. You must load the extension as a shared library before you can use it within your databases.

  1. Load the extension as a shared library before you can use it within your databases. Check for existing shared libraries:

    gpconfig -s shared_preload_libraries
  2. Use the output of the previous command to enable `pg_stat_statements1, along any other shared libraries, and restart WHPG:

    gpconfig -c shared_preload_libraries -v '<other_libraries>, pg_stat_statements'
    gpstop -ar
  3. Create the extension in your database:

    CREATE EXTENSION pg_stat_statements;

Analyzing performance with distributed views

The statistics gathered by the module are made available via two specialized views:

whpg_stat_statements

This view displays raw statistics from the coordinator and all segments in a single table.

SELECT * FROM whpg_stat_statements;
ColumnTypeDescription
gp_segment_idintegerSegment identifier: -1 for the coordinator, 0...N-1 for segments.
useridoidOID of the user who executed the statement.
dbidoidOID of the database in which the statement was executed.
queryidbigintInternal hash code computed from the statement's parse tree.
querytextNormalized representative query text.
plansbigintNumber of times the statement was planned.
total_plan_timedouble precisionTotal time spent planning the statement (milliseconds).
callsbigintNumber of times the statement was executed.
total_exec_timedouble precisionTotal time spent executing the statement (milliseconds).
rowsbigintTotal number of rows retrieved or affected by the statement.
shared_blks_hitbigintTotal number of shared block cache hits.
shared_blks_readbigintTotal number of shared blocks read from disk.
shared_blks_dirtiedbigintTotal number of shared blocks dirtied.
shared_blks_writtenbigintTotal number of shared blocks written to disk.
local_blks_hitbigintTotal number of local block cache hits.
local_blks_readbigintTotal number of local blocks read from disk.
temp_blks_readbigintTotal number of temporary blocks read from disk.
temp_blks_writtenbigintTotal number of temporary blocks written to disk.
blk_read_timedouble precisionTotal time spent reading blocks (if track_io_timing is enabled).
blk_write_timedouble precisionTotal time spent writing blocks (if track_io_timing is enabled).
wal_recordsbigintTotal number of WAL records generated by the statement.
wal_bytesnumericTotal number of WAL bytes generated by the statement.

whpg_stat_statements_aggregated

This view provides a unified summary of total resource consumption per query across the entire cluster.

SELECT * FROM whpg_stat_statements_aggregated;
ColumnTypeDescription
useridoidOID of the user.
dbidoidOID of the database.
queryidbigintInternal hash code.
querytextNormalized representative query text.
callsbigintTotal number of executions.
total_timedouble precisionTotal time spent executing.
min_timedouble precisionMinimum time spent executing.
max_timedouble precisionMaximum time spent executing.
mean_timedouble precisionMean time spent executing.
stddev_timedouble precisionPopulation standard deviation of execution time.
rowsbigintTotal rows retrieved/affected.
shared_blks_hitbigintTotal shared block cache hits.
shared_blks_readbigintTotal shared blocks read from disk.
shared_blks_dirtiedbigintTotal shared blocks dirtied.
shared_blks_writtenbigintTotal shared blocks written to disk.
local_blks_hitbigintTotal local block cache hits.
local_blks_readbigintTotal local blocks read from disk.
local_blks_dirtiedbigintTotal local blocks dirtied.
local_blks_writtenbigintTotal local blocks written to disk.
temp_blks_readbigintTotal temporary blocks read from disk.
temp_blks_writtenbigintTotal temporary blocks written to disk.
blk_read_timedouble precisionTotal block read time.
blk_write_timedouble precisionTotal block write time.

To provide a consolidated view of cluster performance, the metrics in this view are processed as follows:

  • Buffer and I/O statistics (shared_blks_*, local_blks_*, temp_blks_*, blk_*_time) are summed across all nodes to show total cluster effort.
  • Timing statistics (total_time, min_time, max_time, mean_time) reflect the values recorded at the coordinator level, representing the actual duration experienced by the user.

Managing statistics globally

The pg_stat_statements module includes functions to manage and clear gathered performance data.

whpg_stat_statements_reset

Resets all collected statistics on the coordinator and all segments simultaneously.

-- Global reset for all users and databases
SELECT whpg_stat_statements_reset();

-- Targeted reset for specific user, database, or query
SELECT whpg_stat_statements_reset(userid, dbid, queryid);
ParameterTypeDescription
p_useridOidReset entries for a specific user (use 0 for all users).
p_dbidOidReset entries for a specific database (use 0 for all databases).
p_queryidbigintReset a specific query signature (use 0 for all queries).
Note

This function requires superuser privileges.

Examples

View per-segment buffer usage

Check if specific segments are performing significantly more physical reads than others, which often indicates poor data distribution.

SELECT
    gp_segment_id,
    query,
    shared_blks_read,
    shared_blks_hit,
    blk_read_time
FROM whpg_stat_statements
WHERE query LIKE '%my_table%'
ORDER BY gp_segment_id;

Get aggregated statistics

Identify queries consuming the most total I/O time summed across the entire cluster.

SELECT
    query,
    calls,
    rows,
    shared_blks_hit,
    blk_read_time + blk_write_time AS total_io_time_ms
FROM whpg_stat_statements_aggregated
ORDER BY total_io_time_ms DESC
LIMIT 10;

Reset statistics before benchmarking

To ensure accurate results for a new workload, reset all cluster statistics first.

-- 1. Clear cluster-wide history
SELECT whpg_stat_statements_reset();

-- 2. Run workload
-- [Queries]

-- 3. Analyze aggregated cluster costs
SELECT * FROM whpg_stat_statements_aggregated;