Troubleshooting PGAIHM
This guide provides instructions for troubleshooting the Hybrid Manager (PGAIHM).
Logfiles
The following log files can be used when troubleshooting the PGAIHM:
Log file name | Category | Location | Explanation |
---|---|---|---|
Error codes
Error Code | Category | Error Message | Explanation | Potential Solution |
---|---|---|---|---|
na | Installation | [upm-istio] - failed to install components: [upm-istio] | An error from boot-strap: the LoadBalancer service is pending. | Fix the loadbalancer IAM role first, then delete the LB pod to trigger recreation. |
na | Installation | upm-beaco-ff-base install failed | kubectl edit sc gp2 | |
na | Installation | upm-thanos installation failed | ||
na | Installation | Failed to pull image | ||
na | Installation | Loki pods crash | The loki pods crashed and the installation failed. | |
na | Portal Login | No healthy upstream | “No healthy upstream” is returned after the login. | |
401 | Portal Login | 401 Access Denied | 401 error returned when logging into the portal after the installation | |
500 | Portal Login | HTTP Error 500 | The back ingress gateway can't access incoming connections. | |
na | Database Provisioning | Postgres Cluster Provisioning Stuck at X% Complete | ||
na | Database Provisioning | CPU and Memory range is much smaller than the nodegroup | ||
na | Database Update | Disk scale up | The new value is shown on cluster yaml but not applied on pvc/pv. | Edit the storageclass |
Errors in detail
[upm-istio] - failed to install components: [upm-istio]
Detailed Error:
5:03:42AM: ---- waiting on 1 changes [4/5 done] ---- 5:03:42AM: ongoing: reconcile service/istio-ingressgateway (v1) namespace: istio-system 5:03:42AM: ^ Load balancer ingress is empty {"level":"error","msg":"failed to install component: failed to execute kapp command, error: Timed out waiting after 15m0s for resources: [service/istio-ingressgateway (v1) namespace: istio-system], message: "} {"level":"error","msg":"Failed to install components: [upm-istio]"} {"level":"error","msg":"Installation failed, error: failed to install components: [upm-istio]"} ```bash k get svc -n istio-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE istio-ingressgateway LoadBalancer 172.20.184.80 <pending> 80:30270/TCP,443:31858/TCP,9443:32692/TCP 3h3m istiod ClusterIP 172.20.146.223 <none> 15010/TCP,15012/TCP,443/TCP,15014/TCP 3h3m
k describe svc istio-ingressgateway -n istio-system
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedBuildModel 53m service Failed build model due to operation error Elastic Load Balancing v2: DescribeLoadBalancers, get identity: get credentials: failed to refresh cached credentials, failed to retrieve credentials, operation error STS: AssumeRoleWithWebIdentity, https response error StatusCode: 403, RequestID: 07cc71c0-3113-4b4e-be7f-3de3b475c49b, api error AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity
Resolution:
Fix the loadbalancer IAM role then delete the LB pod
k delete pod aws-load-balancer-controller-57ccd8bc77-f8lrk -n kube-system k delete pod aws-load-balancer-controller-57ccd8bc77-j58c2 -n kube-system k get svc -n istio-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE istio-ingressgateway LoadBalancer 172.20.184.80 k8s-istiosys-istioing-38d88046aa-37454e78c9648556.elb.us-east-1.amazonaws.com 80:30270/TCP,443:31858/TCP,9443:32692/TCP 3h8m istiod ClusterIP 172.20.146.223 <none> 15010/TCP,15012/TCP,443/TCP,15014/TCP 3h8m
upm-beaco-ff-base install failed
Detailed Error:
❯ NAME READY STATUS RESTARTS AGE app-db-1-initdb-k4nbl 0/1 Pending 0 35m
kc get pvc -n upm-beaco-ff-base
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE app-db-1 Pending <unset> 35m
kc describe pvc app-db-1 -n upm-beaco-ff-base
Name: app-db-1 Namespace: upm-beaco-ff-base StorageClass: Status: Pending Volume: ……. Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal FailedBinding 64s (x142 over 36m) persistentvolume-controller no persistent volumes available for this claim and no storage class is se
Resolution::
Run kubectl edit sc gp2
and add the following annotation: storageclass.kubernetes.io/is-default-class: "true"
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: annotations: storageclass.kubernetes.io/is-default-class: "true"
Upm-thanos installation failed
Detailed Error:
Resolution:
- Check for any pods in crash loop.
- Check the logs to look for any permission-related errors.
kubectl get pod -n monitoring
Check if the secret is correct or not:
kubectl get secret -n monitoring
The secret should match what you defined in edb-object-storage
:
kubectl get secret -n monitoring thanos-objstore-secret -o yaml
apiVersion: v1 data: objstore.yml: dHlwZTogUzMKY29uZmlnOgogIGJ1Y2tldDogYXBwbGlhbmNlLXNoYW50ZXN0LWVrcy1jbHVzdGVyLWVkYi1wb3N0Z3JlcwogIHJlZ2lvbjogYXAtc291dGgtMQogIGluc2VjdXJlOiBmYWxzZQogIGVuZHBvaW50OiBzMy5hcC1zb3V0aC0xLmFtYXpvbmF3cy5jb20KcHJlZml4OiAiZWRiLW1ldHJpY3MiCg== ~ ❯ echo dHlwZTogUzMKY29uZmlnOgogIGJ1Y2tldDogYXBwbGlhbmNlLXNoYW50ZXN0LWVrcy1jbHVzdGVyLWVkYi1wb3N0Z3JlcwogIHJlZ2lvbjogYXAtc291dGgtMQogIGluc2VjdXJlOiBmYWxzZQogIGVuZHBvaW50OiBzMy5hcC1zb3V0aC0xLmFtYXpvbmF3cy5jb20KcHJlZml4OiAiZWRiLW1ldHJpY3MiCg== | base64 -d
To correct the secret:
kubectl delete secret thanos-objstore-secret -n monitoring kubectl delete pod storage-location-operator-controller-manager-xxxxxx -n storage-location-operator
Re-run helm upgrade and ensure the secret thanos-objstore-secret
is correct:
kubectl delete pod <pod in crashloop> -n monitoring
Failed to pull image
Detailed Error:
NAME READY STATUS RESTARTS AGE edbpgai-bootstrap-job-v1.0.6-appl-d12cvm-q2pbg 0/1 Init:AssetPullBackOff 0 2m49s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 53s default-scheduler Successfully assigned edbpgai-bootstrap/edbpgai-bootstrap-job-v1.0.6-appl-d12cvm-q2pbg to i-001f23a316e13e905 Warning Failed 22s kubelet Failed to pull image "docker.enterprisedb.com/staging_pgai-platform/edbpgai-bootstrap/bootstrap-eks:v1.0.6-appl": rpc error: code = DeadlineExceeded desc = failed to pull and unpack image "docker.enterprisedb.com/staging_pgai-platform/edbpgai-bootstrap/bootstrap-eks:v1.0.6-appl": failed to resolve reference "docker.enterprisedb.com/staging_pgai-platform/edbpgai-bootstrap/bootstrap-eks:v1.0.6-appl": failed to do request: Head "https://docker.enterprisedb.com/v2/staging_pgai-platform/edbpgai-bootstrap/bootstrap-eks/manifests/v1.0.6-appl": dial tcp 18.67.76.32:443: i/o timeout Warning Failed 22s kubelet Error: ErrAssetPull Normal BackOff 22s kubelet Back-off pulling image "docker.enterprisedb.com/staging_pgai-platform/edbpgai-bootstrap/bootstrap-eks:v1.0.6-appl" Warning Failed 22s kubelet Error: AssetPullBackOff Normal Pulling 7s (x2 over 52s) kubelet Pulling image "docker.enterprisedb.com/staging_pgai-platform/edbpgai-bootstrap/bootstrap-eks:v1.0.6-appl"
Resolution:
In your AWS console, edit your related subnet and enable Auto-assign public IPv4 address.
Delete nodes to let EKS pick up the change:
k get node
NAME STATUS ROLES AGE VERSION i-001f23a316e13e905 Ready <none> 19h v1.31.4-eks-0f56d01 i-012fd44b2f575a514 Ready <none> 20h v1.31.4-eks-0f56d01 i-076d6a447c6551a21 Ready <none> 20h v1.31.4-eks-0f56d01
k delete node i-001f23a316e13e905 i-012fd44b2f575a514 i-076d6a447c6551a21
node "i-001f23a316e13e905" deleted node "i-012fd44b2f575a514" deleted node "i-076d6a447c6551a21" deleted
kgp -n edbpgai-bootstrap
NAME READY STATUS RESTARTS AGE edbpgai-bootstrap-job-v1.0.6-appl-5ee7l7-xzfqt 1/1 Running 0 118s
Loki pods crash
Detailed Error:
k get pods -n logging
NAME READY STATUS RESTARTS AGE loki-backend-0 1/2 CrashLoopBackOff 16 (2m1s ago) 59m loki-read-5b8555fb64-9srmq 0/1 CrashLoopBackOff 16 (2m40s ago) 59m loki-read-5b8555fb64-fw6ls 0/1 CrashLoopBackOff 16 (2m7s ago) 59m loki-read-5b8555fb64-nw2vq 0/1 CrashLoopBackOff 16 (2m17s ago) 59m loki-write-0 0/1 CrashLoopBackOff 16 (2m37s ago) 59m loki-write-1 0/1 CrashLoopBackOff 16 (2m5s ago) 59m loki-write-2 0/1 CrashLoopBackOff 16 (2m16s ago) 59m
k logs loki-write-0 -n logging
level=info ts=2025-02-26T09:32:35.173785968Z caller=main.go:126 msg="Starting Loki" version="(version=release-3.1.x-89fe788, branch=release-3.1.x, revision=89fe788d)" level=info ts=2025-02-26T09:32:35.173821129Z caller=main.go:127 msg="Loading configuration file" filename=/etc/loki/config/config.yaml level=info ts=2025-02-26T09:32:35.174421237Z caller=server.go:352 msg="server listening on addresses" http=[::]:3100 grpc=[::]:9095 level=info ts=2025-02-26T09:32:35.175391511Z caller=memberlist_client.go:435 msg="Using memberlist cluster label and node name" cluster_label= node=loki-write-0-a0cfbc5d level=info ts=2025-02-26T09:32:35.17599149Z caller=shipper.go:160 index-store=tsdb-2020-05-15 msg="starting index shipper in WO mode" level=info ts=2025-02-26T09:32:35.176072171Z caller=table_manager.go:136 index-store=tsdb-2020-05-15 msg="uploading tables" level=info ts=2025-02-26T09:32:35.176269663Z caller=head_manager.go:308 index-store=tsdb-2020-05-15 component=tsdb-head-manager msg="loaded wals by period" groups=0 level=info ts=2025-02-26T09:32:35.176305024Z caller=manager.go:86 index-store=tsdb-2020-05-15 component=tsdb-manager msg="loaded leftover local indices" err=null successful=true buckets=0 indices=0 failures=0 level=info ts=2025-02-26T09:32:35.176325994Z caller=head_manager.go:308 index-store=tsdb-2020-05-15 component=tsdb-head-manager msg="loaded wals by period" groups=1 level=info ts=2025-02-26T09:32:35.180393332Z caller=module_service.go:82 msg=starting module=server level=info ts=2025-02-26T09:32:35.180477643Z caller=module_service.go:82 msg=starting module=memberlist-kv level=error ts=2025-02-26T09:32:35.180526964Z caller=loki.go:524 msg="module failed" module=memberlist-kv error="starting module memberlist-kv: invalid service state: Failed, expected: Running, failure: service memberlist_kv failed: failed to create memberlist: Failed to get final advertise address: no private IP address found, and explicit IP not provided" level=error ts=2025-02-26T09:32:35.180549544Z caller=loki.go:524 msg="module failed" module=ring error="failed to start ring, because it depends on module memberlist-kv, which has failed: invalid service state: Failed, expected: Running, failure: starting module memberlist-kv: invalid service state: Failed, expected: Running, failure: service memberlist_kv failed: failed to create memberlist: Failed to get final advertise address: no private IP address found, and explicit IP not provided" level=error ts=2025-02-26T09:32:35.180582375Z caller=loki.go:524 msg="module failed" module=store error="failed to start store, because it depends on module memberlist-kv, which has failed: invalid service state: Failed, expected: Running, failure: starting module memberlist-kv: invalid service state: Failed, expected: Running, failure: service memberlist_kv failed: failed to create memberlist: Failed to get final advertise address: no private IP address found, and explicit IP not provided" level=error ts=2025-02-26T09:32:35.180588655Z caller=loki.go:524 msg="module failed" module=ingester error="failed to start ingester, because it depends on module analytics, which has failed: context canceled" level=error ts=2025-02-26T09:32:35.180593435Z caller=loki.go:524 msg="module failed" module=distributor error="failed to start distributor, because it depends on module analytics, which has failed: context canceled" level=error ts=2025-02-26T09:32:35.180597415Z caller=loki.go:524 msg="module failed" module=analytics error="failed to start analytics, because it depends on module memberlist-kv, which has failed: invalid service state: Failed, expected: Running, failure: starting module memberlist-kv: invalid service state: Failed, expected: Running, failure: service memberlist_kv failed: failed to create memberlist: Failed to get final advertise address: no private IP address found, and explicit IP not provided" level=info ts=2025-02-26T09:32:35.180687877Z caller=modules.go:1832 msg="server stopped" level=info ts=2025-02-26T09:32:35.180702097Z caller=module_service.go:120 msg="module stopped" module=server level=info ts=2025-02-26T09:32:35.180708417Z caller=loki.go:508 msg="Loki stopped" running_time=85.855531ms failed services github.com/grafana/loki/v3/pkg/loki.(*Loki).Run /src/loki/pkg/loki/loki.go:566 main.main /src/loki/cmd/loki/main.go:129 runtime.main /usr/local/go/src/runtime/proc.go:271 runtime.goexit /usr/local/go/src/runtime/asm_amd64.s:1695 level=error ts=2025-02-26T09:32:35.180772078Z caller=log.go:216 msg="error running loki" err="failed services\ngithub.com/grafana/loki/v3/pkg/loki.(*Loki).Run\n\t/src/loki/pkg/loki/loki.go:566\nmain.main\n\t/src/loki/cmd/loki/main.go:129\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:271\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695"
Resolution:
There is a known issue from Loki https://github.com/grafana/loki/issues/8634 due to the CIDR not 10.*
- Check the IPv4 CIDR in your VPC. For example, if it's set to 11.1.0.0/20, it should be 10.1.0.0/20.
No healthy upstream
Detailed Error:
The pod at upm-dex
is in a crash loop. Check the pod logs to view the error details. In this case, pgai.portal.authentication.staticPasswords
is not configured correctly, so update the correct configuration and re-run the deployment.
upm-dex upm-dex-66b96b5857-pdcg2 1/2 CrashLoopBackOff 206 (3m34s ago) 17h kubectl logs upm-dex-66b96b5857-pdcg2 -n upm-dex error parse config file /tmp/dex.config.yaml-565347154: error unmarshaling JSON: malformed bcrypt hash: crypto/bcrypt: hashedSecret too short to be a bcrypted password
Resolution:
Added the 4 parameter on the upgrade command:
helm upgrade \ -n edbpgai-bootstrap \ --install \ --set parameters.global.portal_domain_name="${PORTAL_DOMAIN_NAME}" \ --set parameters.transporter-rw-service.domain_name="${TRANSPORTER_RW_SERVICE_DOMAIN_NAME}" \ --set parameters.transporter-dp-agent.rw_service_url="https://${TRANSPORTER_RW_SERVICE_DOMAIN_NAME}/transporter" \ --set parameters.upm-beacon.server_host="${BEACON_SERVICE_DOMAIN_NAME}" \ --set parameters.upm-beaco-ff-base.cookie_aeskey="${AES_256_KEY}" \ --set system="eks" \ --set remoteContainerRegistryURL="${REGISTRY_PACKAGE_URL}" \ --set internalContainerRegistryURL="${REGISTRY_PACKAGE_URL}" \ --set bootstrapAsset="${REGISTRY_PACKAGE_URL}/edbpgai-bootstrap/bootstrap-eks:${EDBPGAI_BOOTSTRAP_IMAGE_VERSION}" \ --set pgai.portal.authentication.staticPasswords[0].email="owner@mycompany.com" \ --set pgai.portal.authentication.staticPasswords[0].hash='$2y$10$STTzuGJa3PydqHvlzi2z6OgDU1G/JLTqiuGblH.RemIutWxkztN5m' \ --set pgai.portal.authentication.staticPasswords[0].username="owner@mycompany.com" \ --set pgai.portal.authentication.staticPasswords[0].userID="c5998173-a605-449a-a9a5-4a9c33e26df7" \ --version "${EDBPGAI_BOOTSTRAP_HELM_CHART_VERSION}" \ edbpgai-bootstrap enterprisedb-edbpgai/edbpgai-bootstrap
401 Access Denied
Detailed Error:
Resolution::
Check the config.yaml
file. The staticPasswords
section is not set properly at the case. Re-run helm upgrade
with the proper values.
k get secret upm-dex -n upm-dex -o yaml
apiVersion: v1 data: config.yaml: aXNzdWVyOiBodHRwczovL3BvcnRhbC5iYXN1cHBvcnQub3JnL2F1dGgKc3RvcmFnZToKICB0eXBlOiBwb3N0Z3JlcwogIGNvbmZpZzoKICAgIGhvc3Q6IGFwcC1kYi1ydy51cG0tYmVhY28tZmYtYmFzZS5zdmMuY2x1c3Rlci5sb2NhbAogICAgcG9ydDogNTQzMgogICAgZGF0YWJhc2U6IHVwbQogICAgdXNlcjogdXBtCiAgICBwYXNzd29yZDogdXBtCiAgICBzc2w6CiAgICAgIG1vZGU6IHJlcXVpcmUKd2ViOgogIGh0dHA6IDAuMC4wLjA6NTU1Ngpmcm9udGVuZDoKICBpc3N1ZXI6IEVEQgogIGxvZ29VUk… echo "xxx" | base64 -d staticPasswords: - email: owner@mycompany.com hash: XXX userID: c5998173-a605-449a-a9a5-4a9c33e26df7 username: owner@mycompany.com
As an alternative, use this one line command:
k get secret -n upm-dex upm-dex -ojsonpath='{.data.config\.yaml}' | base64 -d
HTTP Error 500
Detailed Error:
The back ingress gateway can't access incoming connections. If you're accessing the webpage from a public network, it must be through port 443 since https is the only allowed protocol.
Resolution:
Check the status of the ingressgateway
. Our ingressgateway
is an istio-ingressgateway
.
k get all -n istio-system
NAME READY STATUS RESTARTS AGE pod/istio-ingressgateway-7956f6d57c-jd8jm 1/1 Running 0 14m pod/istio-ingressgateway-7956f6d57c-jmmv8 1/1 Running 0 14m pod/istio-ingressgateway-7956f6d57c-r65tk 1/1 Running 0 14m pod/istiod-7f7bbcdcbb-fzggv 1/1 Running 0 3h26m pod/istiod-7f7bbcdcbb-mxjpz 1/1 Running 0 3h26m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/istio-ingressgateway LoadBalancer 172.20.92.132 k8s-istiosys-istioing-987da680df-8452f790b4602f2c.elb.us-east-1.amazonaws.com 80:30950/TCP,443:32604/TCP,9443:31744/TCP 3h26m service/istiod ClusterIP 172.20.112.158 <none> 15010/TCP,15012/TCP,443/TCP,15014/TCP 3h26m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/istio-ingressgateway 3/3 3 3 3h26m deployment.apps/istiod 2/2 2 2 3h26m NAME DESIRED CURRENT READY AGE replicaset.apps/istio-ingressgateway-68fcd46fff 0 0 0 84m replicaset.apps/istio-ingressgateway-7956f6d57c 3 3 3 14m replicaset.apps/istio-ingressgateway-866d9bd74b 0 0 0 3h26m replicaset.apps/istiod-7f7bbcdcbb 2 2 2 3h26m
Check the gateway configuration for configuration errors. The gateway rule is already set to listen on 443 ports:
k get gateway/upm-portal -n istio-system -oyaml
apiVersion: networking.istio.io/v1 kind: Gateway …. spec: selector: istio: ingressgateway servers: - hosts: - portal.basupport.org port: name: http number: 80 protocol: HTTP tls: httpsRedirect: true - hosts: - portal.basupport.org port: name: https-443 number: 443 protocol: HTTPS …
Then check the deployment logs:
2025-02-20T03:42:32.203112Z warning envoy main external/envoy/source/server/server.cc:835 Usage of the deprecated runtime key overload.global_downstream_max_connections, consider switching to `envoy.resource_monitors.downstream_connections` instead.This runtime key will be removed in future. thread=15 2025-02-20T03:42:32.203389Z warning envoy main external/envoy/source/server/server.cc:928 There is no configured limit to the number of allowed active downstream connections. Configure a limit in `envoy.resource_monitors.downstream_connections` resource monitor. thread=15 2025-02-20T03:42:32.211729Z info xdsproxy connected to delta upstream XDS server: istiod.istio-system.svc:15012 id=1 2025-02-20T03:42:32.268369Z info ads ADS: new connection for node:istio-ingressgateway-68fcd46fff-g4nth.istio-system-1 2025-02-20T03:42:32.268437Z info cache returned workload certificate from cache ttl=23h59m59.731565472s 2025-02-20T03:42:32.268890Z info ads SDS: PUSH request for node:istio-ingressgateway-68fcd46fff-g4nth.istio-system resources:1 size:4.0kB resource:default 2025-02-20T03:42:32.279500Z info ads ADS: new connection for node:istio-ingressgateway-68fcd46fff-g4nth.istio-system-2 2025-02-20T03:42:32.279654Z info cache returned workload certificate from cache ttl=23h59m59.72034871s 2025-02-20T03:42:32.279711Z info ads SDS: PUSH request for node:istio-ingressgateway-68fcd46fff-g4nth.istio-system resources:1 size:4.0kB resource:default 2025-02-20T03:42:32.280287Z info ads ADS: new connection for node:istio-ingressgateway-68fcd46fff-g4nth.istio-system-3 2025-02-20T03:42:32.280372Z info cache returned workload trust anchor from cache ttl=23h59m59.719629366s 2025-02-20T03:42:32.280503Z info ads SDS: PUSH request for node:istio-ingressgateway-68fcd46fff-g4nth.istio-system resources:1 size:1.1kB resource:ROOTCA 2025-02-20T03:42:32.320657Z info wasm fetching image staging_pgai-platform/upm-beaco-filters/upm-oidc from registry docker.enterprisedb.com with tag v1.1.44 2025-02-20T03:42:32.323034Z info wasm fetching image staging_pgai-platform/upm-beaco-filters/upm-error-transformer from registry docker.enterprisedb.com with tag v1.1.44 2025-02-20T03:42:32.323073Z info wasm fetching image staging_pgai-platform/upm-beaco-filters/upm-authz-checker from registry docker.enterprisedb.com with tag v1.1.44 2025-02-20T03:42:35.074684Z warning envoy wasm external/envoy/source/extensions/common/wasm/context.cc:1198 wasm log: error parsing plugin configuration: Error("aes_key: Invalid Length, got 3 bytes, expected 32", line: 1, column: 12) thread=15 2025-02-20T03:42:35.074712Z error envoy wasm external/envoy/source/extensions/common/wasm/wasm.cc:110 Wasm VM failed Failed to configure base Wasm plugin thread=15 2025-02-20T03:42:35.075923Z critical envoy wasm external/envoy/source/extensions/common/wasm/wasm.cc:474 Plugin configured to fail closed failed to load thread=15 2025-02-20T03:42:35.076559Z warning envoy config external/envoy/source/extensions/config_subscription/grpc/delta_subscription_state.cc:269 delta config for type.googleapis.com/envoy.config.core.v3.TypedExtensionConfig rejected: Unable to create Wasm HTTP filter istio-system.upm-oidc thread=15 2025-02-20T03:42:35.076572Z warning envoy config external/envoy/source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:138 gRPC config for type.googleapis.com/envoy.config.core.v3.TypedExtensionConfig rejected: Unable to create Wasm HTTP filter istio-system.upm-oidc thread=15 2025-02-20T03:42:35.076577Z warning envoy config external/envoy/source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:138 gRPC config for type.googleapis.com/envoy.config.core.v3.TypedExtensionConfig rejected: Unable to create Wasm HTTP filter istio-system.upm-oidc thread=15 2025-02-20T03:42:35.076584Z warning envoy config external/envoy/source/extensions/config_subscription/grpc/grpc_subscription_impl.cc:138 gRPC config for type.googleapis.com/envoy.config.core.v3.TypedExtensionConfig rejected: Unable to create Wasm HTTP filter istio-system.upm-oidc thread=15 2025-02-20T03:42:35.189419Z info Readiness succeeded in 3.054667763s 2025-02-20T03:42:35.189690Z info Envoy proxy is ready
The error here was because the wasm plugin
was not bootstrapped successfully, so the ingressgateway
isn’t really ready:
k get pod -listio=ingressgateway -n istio-system
Enter MFA code for arn:aws:iam::121858151946:mfa/zane.zhou@enterprisedb.com: NAME READY STATUS RESTARTS AGE istio-ingressgateway-7956f6d57c-jd8jm 1/1 Running 0 72m istio-ingressgateway-7956f6d57c-jmmv8 1/1 Running 0 72m istio-ingressgateway-7956f6d57c-r65tk 1/1 Running 0 72m
but in the pod we can see the readiness check actually failed:
k describe pod/istio-ingressgateway-7956f6d57c-jd8jm -n istio-system
Warning Unhealthy 20m (x2 over 20m) kubelet Readiness probe failed: Get "http://10.0.30.134:15021/healthz/ready": dial tcp 10.0.30.134:15021: connect: connection refused
This issue is caused by a misconfiguration of the wasm plugin
, the aes_key
is not correctly set. Fix the configuration and redeploy the ingressgateway
:
k edit WasmPlugin/upm-oidc
Then set the following aes_key
to the default value (or a custom-generated value in a real production environment):
pluginConfig: aes_key: rzkutHl8NJNztPMEJYykZouHslNiA7xmIXH+58ISUVo=
Then restart the deployment:
k rollout restart deployment/istio-ingressgateway -n istio-system
Postgres Cluster Provisioning Stuck at X% Complete
Detailed Error:
A single-node cluster creation stuck at 87% completion.
Resolution:
The CPU and Memory range is much smaller than the nodegroup
Detailed Error:
The nodegroup is m5.4xlarge which is 16vCPU and 64GB RAM with no node. At cluster creation, the range shown at UI are CPU (0-3.92 Cores) and Memory (0-14.3 Gi)
Resolution:
This is due to 0 size nodepool. If the nodepool has node, it’s fine. But when the problem occurs, then I create the node, it doesn’t work, UI still show the same value after the node created.
Pending: https://enterprisedb.atlassian.net/browse/UPM-45883
Disk scale up
Detailed Error:
Scale disk on UI, the new value is shown on cluster yaml but not applied on pvc/pv.
kubectl get cluster p-2hgee2782y -o yaml
storage: resizeInUseVolumes: true size: 5Gi
k get pv
pvc-4a35b3ba-507e-4d4c-9017-360326e57b9d 2Gi RWO Delete Bound p-2hgee2782y/p-2hgee2782y-1
Resolution:
CNP operator logs contains the error:
kubectl logs postgresql-operator-controller-manager-7cc67597df-4ggdf -n postgresql-operator-system
{ "level": "error", "ts": "2025-01-17T10:02:35.739693483Z", "msg": "Reconciler error", "controller": "cluster", "controllerGroup": "postgresql.k8s.enterprisedb.io", "controllerKind": "Cluster", "Cluster": { "name": "p-2hgee2782y", "namespace": "p-2hgee2782y" }, "namespace": "p-2hgee2782y", "name": "p-2hgee2782y", "reconcileID": "2de2ade7-2bf7-45f4-a2af-3f135a4f7ad9", "error": "persistentvolumeclaims \"p-2hgee2782y-1\" is forbidden: only dynamically provisioned pvc can be resized and the storageclass that provisions the pvc must support resize", "stacktrace": "sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\tcloud-native-postgres/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\tcloud-native-postgres/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:263\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\tcloud-native-postgres/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:224" }
Check and edit storageclass
to allowVolumeExpansion: true
kubecl edit storageclass gp2
k get storageclass gp2 -o yaml
allowVolumeExpansion: true
Delete the CNP operator pod to apply the new change:
kubectl delete pod postgresql-operator-controller-manager-7cc67597df-tgl7b -n postgresql-operator-system
kubectl get pv
pvc-4a35b3ba-507e-4d4c-9017-360326e57b9d 5Gi RWO Delete Bound p-2hgee2782y/p-2hgee2782y-1
- On this page
- Logfiles
- Error codes
- Errors in detail
Could this page be better? Report a problem or suggest an addition!