Observability for Model Serving Innovation Release
This documentation covers the current Innovation Release of
EDB Postgres AI. You may also want the docs for the current LTS version.
Observability helps you ensure that your deployed AI models are running efficiently and reliably within AI Factory.
Model Serving in AI Factory uses KServe on Kubernetes to serve models. This provides built-in options to monitor:
- Model serving status and availability
- Resource usage (CPU, Memory, GPU)
- Inference performance and throughput
Key monitoring capabilities
KServe InferenceService status
You can inspect model serving status directly via Kubernetes:
kubectl get inferenceservice -n <namespace>
Common status fields include:
- Ready / NotReady
- URL endpoint
- Current replicas
- Allocated resources (GPU, CPU, Memory)
For detailed inspection:
kubectl describe inferenceservice <name> -n <namespace>
GPU utilization monitoring
If your models are deployed on GPU nodes, monitor GPU usage to optimize resource allocation.
Example:
kubectl top node
For deeper GPU-specific metrics (if supported):
nvidia-smi
Prometheus and Grafana integration
If Prometheus is configured, AI Factory model serving exposes metrics through KServe:
Prometheus annotations to enable scraping:
serving.kserve.io/enable-prometheus-scraping: "true" prometheus.kserve.io/port: "8000" prometheus.kserve.io/path: "/v1/metrics"
You can build Grafana dashboards to monitor:
- Inference requests per second
- Latency and error rates
- GPU utilization trends
- Pod restarts and health
Logs and debugging
You can access detailed logs from model serving pods: