How-To Setup GPU resources Innovation Release

This documentation covers the current Innovation Release of EDB Postgres AI. See also:

Hybrid Manager dual release strategy
Documentation for the current Long-term support release

Prerequisite: Access to the Hybrid Manager UI with AI Factory enabled. See AI Factory in Hybrid Manager.

Use this guide to prepare GPU resources in your Kubernetes cluster (Hybrid Manager or compatible) to support Model Serving with KServe.

Goal

Prepare your cluster to run GPU-based Model Serving workloads using KServe.

Estimated time

20–40 minutes (provisioning depends on your cloud provider).

What you accomplish

Provision GPU node groups/pools in your cluster.
Label and taint GPU nodes correctly.
Deploy the NVIDIA device plugin DaemonSet.
Store your NVIDIA API key as a Kubernetes secret.
Enable your cluster to run NIM model containers in KServe.

Prerequisites

Access to a Kubernetes cluster with appropriate permissions.
Administrative access to provision node groups (AWS EKS / GCP GKE / RHOS).
NVIDIA API key for accessing NIM models.
Familiarity with kubectl.

Provision GPU nodes

Provision GPU node groups (EKS) or node pools (GKE/RHOS):

Use instances with L40S or A100 GPUs (for example, g6e.12xlarge on AWS or a2-highgpu-4g on GCP).
Recommended: at least one node with four GPUs for large models.

Label and taint GPU nodes

kubectl label node <gpu-node-name> nvidia.com/gpu=true
kubectl taint nodes <gpu-node-name> nvidia.com/gpu=true:NoSchedule

Deploy the NVIDIA device plugin

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.1/nvidia-device-plugin.yml
kubectl get ds -n kube-system nvidia-device-plugin-daemonset

Store NVIDIA API key as Kubernetes secret

kubectl create secret generic nvidia-nim-secrets --from-literal=NGC_API_KEY=<your_NVIDIA_API_KEY>

This secret is used by ClusterServingRuntime for NIM models.

KServe concepts

Next steps

← Prev

How-To Manage Repository and Image Tag Metadata

↑ Up

AI Factory Models

How-To Configure a ClusterServingRuntime