Troubleshooting v1.1.2

In this page, you can find some basic information on how to troubleshoot EDB Postgres Distributed for Kubernetes in your Kubernetes cluster deployment.

Hint

As a Kubernetes administrator, you should have the kubectl Cheat Sheet page bookmarked!

Before you start

Kubernetes environment

What can make a difference in a troubleshooting activity is to provide clear information about the underlying Kubernetes system.

Make sure you know:

  • the Kubernetes distribution and version you are using
  • the specifications of the nodes where PostgreSQL is running

Useful utilities

On top of the mandatory kubectl utility, for troubleshooting, we recommend the following plugins/utilities to be available in your system:

  • cnp plugin for kubectl, which could be used to talk with each individual node (PG4K cluster)
  • jq, a lightweight and flexible command-line JSON processor
  • grep, searches one or more input files for lines containing a match to a specified pattern. It is already available in most *nix distros. If you are on Windows OS, you can use findstr as an alternative to grep or directly use wsl and install your preferred *nix distro and use the tools mentioned above.

Logs

All resources created and managed by EDB Distributed Postgres for Kubernetes log to standard output in accordance with Kubernetes conventions, using JSON format.

While logs are typically processed at the infrastructure level and include those from EDB Distributed Postgres for Kubernetes and EDB Postgres for Kubernetes, accessing logs directly from the command line interface is critical during troubleshooting. You have three primary options for doing so:

  • Use the kubectl logs command to retrieve logs from a specific resource, and apply jq for better readability.
  • Use the kubectl cnp logs command for EDB Postgres for Kubernetes-specific logging, this is useful to collecting logs node by node.
  • Leverage specialized open-source tools like stern, which can aggregate logs from multiple resources (e.g., all pods in a PGDGroup by selecting the k8s.pgd.enterprisedb.io/group label), filter log entries, customize output formats, and more.
Note

The following sections provide examples of how to retrieve logs for various resources when troubleshooting EDB Distributed Postgres for Kubernetes.

Operator information

There are two operators for managing resources within a PGDGroup:

  • EDB Postgres Distributed for Kubernetes (PGD4K operator) Manages the PGDGroup and resources directly created by it.
  • EDB Postgres for Kubernetes (PG4K operator) Manages PG4K clusters, which are used as nodes within a PGDGroup.

By default, the PGD4K operator is installed in the pgd-operator-system namespace as a Deployment. (Refer to the "Details about the deployment" section section for more information.)

To list the operator pods, run:

kubectl get pods -n pgd-operator-system
Note

Under normal circumstances, you should have one pod where the operator is running, identified by a name starting with pgd-operator-controller-manager-. In case you have set up your operator for high availability, you should have more entries. Those pods are managed by a deployment named pgd-operator-controller-manager.

Collect the relevant information about the operator that is running in pod <POD> with:

kubectl describe pod -n pgd-operator-system <POD>

Then get the logs from the same pod by running:

kubectl logs -n pgd-operator-system <POD>

Gather more information about the PGD4K operator

Get logs from all pods in EDB Distributed Postgres for Kubernetes operator Deployment (in case you have a multi operator deployment) by running:

kubectl logs -n pgd-operator-system \
  deployment/pgd-operator-controller-manager --all-containers=true
Tip

You can add -f flag to above command to follow logs in real time.

Save logs to a JSON file by running:

kubectl logs -n pgd-operator-system \
  deployment/pgd-operator-controller-manager --all-containers=true | \
  jq -r . > pgd_logs.json

Gather more information about the PG4K operator

As PGD4K operator leverage PG4K operator to manage each node, we also need to collect the PG4K operator logs, please visit Operator Information to collects logs for PG4K operator.

PGDGroup information

You can check the status of the pgd-sample PGDGroup in the NAMESPACE namespace with:

kubectl get pgdgroup -n <NAMESPACE> pgd-sample

Output:

NAME         DATA INSTANCES   WITNESS INSTANCES   PHASE                AGE
pgd-sample   2                1                   PGDGroup - Healthy   3h1m

The above example describes a healthy PGDGroup cluster consisting of 2 data nodes and 1 witness node.

A PGDGroup is composed of multiple nodes, where each node is a single-instance PG4K cluster. To view all nodes, you can retrieve the cluster information and filter by the label k8s.pgd.enterprisedb.io/group. Each node is named following the format: <group name>-<number>.

kubectl -n pgd get cluster -l k8s.pgd.enterprisedb.io/group=pgd-sample -A

Output:

NAME           AGE    INSTANCES   READY   STATUS                     PRIMARY
pgd-sample-1   3h2m   1           1       Cluster in healthy state   pgd-sample-1-1
pgd-sample-2   179m   1           1       Cluster in healthy state   pgd-sample-2-1
pgd-sample-3   176m   1           1       Cluster in healthy state   pgd-sample-3-1

PGD Node pod information

Each PGD node is a single-instance PG4K cluster running on Kubernetes. To retrieve the list of instances belonging to a specific PGDGroup, use the following command:

with:

kubectl get pod -l k8s.pgd.enterprisedb.io/group=pgd-sample -A

Output:

NAMESPACE   NAME                 READY   STATUS    RESTARTS   AGE    ROLE
pgd         pgd-sample-1-1       1/1     Running   0          57m    primary
pgd         pgd-sample-2-1       1/1     Running   0          61m    primary
pgd         pgd-sample-3-1       1/1     Running   0          65m    primary
pgd         pgd-sample-proxy-0   1/1     Running   0          3h
pgd         pgd-sample-proxy-1   1/1     Running   0          179m

You can check if/how a pod is failing by running:

kubectl get pod -n <NAMESPACE> -o yaml <GROUP>-<N>-1

You can get all the logs for a given pgd with:

kubectl logs -n <NAMESPACE> <GROUP>-<N>-1

If you want to limit the search to the PostgreSQL process only, you can run:

kubectl logs -n <NAMESPACE> <GROUP>-<N>-1 | \
  jq 'select(.logger=="postgres") | .record.message'

The following example also adds the timestamp:

kubectl logs -n <NAMESPACE> <GROUP>-<N>-1 | \
  jq -r 'select(.logger=="postgres") | [.ts, .record.message] | @csv'

If the timestamp is displayed in Unix Epoch time, you can convert it to a user-friendly format:

kubectl logs -n <NAMESPACE> <GROUP>-<N>-1 | \
  jq -r 'select(.logger=="postgres") | [(.ts|strflocaltime("%Y-%m-%dT%H:%M:%S %Z")), .record.message] | @csv'

Gather and filter extra information about PostgreSQL pods

Check logs from a specific pod that has crashed:

kubectl logs -n <NAMESPACE> --previous <GROUP>-<N>-1

Get FATAL errors from a specific PostgreSQL pod:

kubectl logs -n <NAMESPACE> <GROUP>-<N>-1 | \
  jq -r '.record | select(.error_severity == "FATAL")'

Output:

{
  "log_time": "2021-11-08 14:07:44.520 UTC",
  "user_name": "streaming_replica",
  "process_id": "68",
  "connection_from": "10.244.0.10:60616",
  "session_id": "61892f30.44",
  "session_line_num": "1",
  "command_tag": "startup",
  "session_start_time": "2021-11-08 14:07:44 UTC",
  "virtual_transaction_id": "3/75",
  "transaction_id": "0",
  "error_severity": "FATAL",
  "sql_state_code": "28000",
  "message": "role \"streaming_replica\" does not exist",
  "backend_type": "walsender"
}

Filter PostgreSQL DB error messages in logs for a specific pod:

kubectl logs -n <NAMESPACE> <GROUP>-<N>-1 | jq -r '.err | select(. != null)'

Output:

dial unix /controller/run/.s.PGSQL.5432: connect: no such file or directory

Get messages matching err word from a specific pod:

kubectl logs -n <NAMESPACE> <GROUP>-<N>-1 | jq -r '.msg' | grep "err"

Output:

2021-11-08 14:07:39.610 UTC [15] LOG:  ending log output to stderr

Get all logs from PostgreSQL process from a specific pod:

kubectl logs -n <NAMESPACE> <GROUP>-<N>-1| \
  jq -r '. | select(.logger == "postgres") | select(.msg != "record") | .msg'

Output:

2021-11-08 14:07:52.591 UTC [16] LOG:  redirecting log output to logging collector process
2021-11-08 14:07:52.591 UTC [16] HINT:  Future log output will appear in directory "/controller/log".
2021-11-08 14:07:52.591 UTC [16] LOG:  ending log output to stderr
2021-11-08 14:07:52.591 UTC [16] HINT:  Future log output will go to log destination "csvlog".

Get pod logs filtered by fields with values and join them separated by | running:

kubectl logs -n <NAMESPACE> <GROUP>-<N>-1 | \
  jq -r '[.level, .ts, .logger, .msg] | join(" | ")'

Output:

info | 1636380469.5728037 | wal-archive | Backup not configured, skip WAL archiving
info | 1636383566.0664876 | postgres | record

ScheduledBackup and Backup information

You can list the scheduled backups for the pgdgroup-backup label filter using the following command:

kubectl get scheduledbackup -l k8s.pgd.enterprisedb.io/group=pgdgroup-backup -A

Output:

NAME                           AGE     CLUSTER             LAST BACKUP
pgdgroup-backup-1-pgd-barman   9m27s   pgdgroup-backup-1   9m27s
pgdgroup-backup-1-pgd-vol      9m26s   pgdgroup-backup-1   9m26s

The scheduled backup is named with the format <cluster>-pgd-<backup method>. The <cluster> is the node name chosen as the backup target. If the backup is properly configured, WAL archiving occurs on all nodes, but backup is only taken on the selected node.

You can also list the backups that have been created with:

kubectl get backup -l k8s.pgd.enterprisedb.io/group=pgdgroup-backup -A

Output:

NAMESPACE   NAME                                          AGE   CLUSTER             METHOD              PHASE       ERROR
pgd         pgdgroup-backup-1-pgd-barman-20250731094444   18m   pgdgroup-backup-1   barmanObjectStore   completed
pgd         pgdgroup-backup-1-pgd-vol-20250731094445      18m   pgdgroup-backup-1   volumeSnapshot      completed

More trouble shooting knowledge

You can reference Before you start in CloudNativePG cluster for more information about kubernetes trouble shooting knowledge.

Some known issues

These known issues and limitations are in the current release of EDB Postgres Distributed for Kubernetes.

Postgres major version upgrades

This version of EDB Postgres Distributed for Kubernetes release (v1.1.2) supports the major version upgrade with following restrictions:

  • PGD4K Operator v1.1.2 or higher
  • PG4K Operator v1.26.0 or higher
  • PGD Operand 5.8 or greater

Physical join

Since release v1.1.1 and operand PGD 5.7, the operator supports node physical joins to other ready nodes.

If a physical join job is failed:

> kubectl get job -n <NAMESPACE>
NAME                                     STATUS   COMPLETIONS   DURATION   AGE
pgdgroup-backup-barman-2-physical-join   Failed   0/1           10m        10m

we can try to delete the job to trigger the physical join again

kubectl delete job pgdgroup-backup-barman-2-physical-join -n <NAMESPACE>

A key pre-condition for a physical join is the establishment of a global Raft consensus. If a physical join job is pending, you can use the PGD function bdr.monitor_group_raft to verify whether this pre-condition has been satisfied.

Data migration

This version of EDB Postgres Distributed for Kubernetes doesn't support declarative import of data from other Postgres databases. To migrate schemas and data, you can use traditional Postgres migration tools such as EDB*Loader or Migration Toolkit/Replication Server. You can also use pg_dump ... on the source database and pipe the command's output to your target database with psql -c.

Connectivity with PgBouncer

EDB Postgres Distributed for Kubernetes doesn't support using PgBouncer to pool client connection requests. This limitation applies to both the open-source and EDB versions of PgBouncer.

Backup operations

To configure an EDB Postgres Distributed for Kubernetes environment, you must apply a PGDGroup YAML object to each Kubernetes cluster. Applying this object creates all necessary services for implementing a distributed architecture.

If you added a spec.backup section to this PGDGroup object with the goal of setting up a backup configuration, the backup will fail unless you also set the spec.backup.schedulers value.

Error output example:

The PGDGroup "region-a" is invalid: spec.backup.schedulers: Invalid value: "": Empty spec string

Workaround

To work around this issue, add a spec.backup.schedulers section with a schedule that meets your requirements, for example:

spec:
  instances: 3
  pgd:
    parentGroup:
      create: true
      name: world
  backup:
    configuration:
      barmanObjectStore:
        ...
    schedulers:
      - method: barmanObjectStore
        immediate: true
        schedule: "0 */5 * * * *"

Known issues and limitations in EDB Postgres Distributed

All issues and limitations known for the EDB Postgres Distributed version that you include in your deployment also affect your EDB Postgres Distributed for Kubernetes instance.

For example, if the EDB Postgres Distributed version you're using is 5.x, your EDB Postgres Distributed for Kubernetes instance will be affected by these 5.x known issues and 5.x limitations.