Setup GPU resources in Hybrid Manager
Use this guide to prepare GPU resources in your Hybrid Manager (HCP) Kubernetes cluster to support Model Serving with KServe.
You provision GPU-enabled nodes, configure them for KServe, and store required secrets for deploying NVIDIA NIM models.
Goal
Prepare your HCP cluster to run GPU-based Model Serving workloads using KServe.
Estimated time
20–40 minutes (provisioning might take longer depending on your cloud provider).
What you accomplish
- Provision GPU node groups or node pools in your HCP cluster.
- Label and taint GPU nodes correctly.
- Deploy the NVIDIA device plugin DaemonSet.
- Store your NVIDIA API key as a Kubernetes secret.
- Enable your cluster to run NIM model containers in KServe.
What this unlocks
After you complete this procedure, you can deploy supported GPU-accelerated models through Model Serving:
- For AIDB Knowledge Bases.
- For GenAI Builder assistants.
- For custom model-based applications.
Prerequisites
- Access to an HCP Kubernetes cluster with appropriate permissions.
- Administrative access to provision node groups (AWS EKS / GCP GKE / RHOS).
- NVIDIA API key for accessing NIM models.
- Familiarity with basic
kubectl
usage.
Provision GPU nodes
Provision GPU node groups (EKS) or node pools (GKE / RHOS) in your HCP cluster:
- Use instances with L40S or A100 GPUs (for example,
g6e.12xlarge
on AWS ora2-highgpu-4g
on GCP). - Recommended: Provision at least one node with four GPUs to support large models such as Llama 70B.
Label and taint GPU nodes
Apply the following Kubernetes label and taint to GPU nodes.
Label:
kubectl label node <gpu-node-name> nvidia.com/gpu=true
Taint:
kubectl taint nodes <gpu-node-name> nvidia.com/gpu=true:NoSchedule
This ensures that KServe model pods are scheduled correctly and that Postgres clusters do not land on GPU nodes.
Deploy the NVIDIA device plugin
Deploy the NVIDIA device plugin DaemonSet.
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.1/nvidia-device-plugin.yml
Verify that the plugin is running.
kubectl get ds -n kube-system nvidia-device-plugin-daemonset
The plugin exposes GPU resources to Kubernetes and KServe.
Store NVIDIA API key as Kubernetes secret
Generate an NVIDIA API key from the NGC Catalog portal.
Create a Kubernetes secret.
kubectl create secret generic nvidia-nim-secrets --from-literal=NGC_API_KEY=<your_NVIDIA_API_KEY>
This secret is required when deploying ClusterServingRuntime
resources for NIM models.
Related concepts
Next steps
← Prev
AI Factory How-To Guides (Hybrid Manager)
↑ Up
AI Factory How-To Guides (Hybrid Manager)
Next →
Verify InferenceServices and GPU Usage in Hybrid Manager
Could this page be better? Report a problem or suggest an addition!