Setup GPU resources in Hybrid Manager

Use this guide to prepare GPU resources in your Hybrid Manager (HCP) Kubernetes cluster to support Model Serving with KServe.

You provision GPU-enabled nodes, configure them for KServe, and store required secrets for deploying NVIDIA NIM models.

Goal

Prepare your HCP cluster to run GPU-based Model Serving workloads using KServe.

Estimated time

20–40 minutes (provisioning might take longer depending on your cloud provider).

What you accomplish

  • Provision GPU node groups or node pools in your HCP cluster.
  • Label and taint GPU nodes correctly.
  • Deploy the NVIDIA device plugin DaemonSet.
  • Store your NVIDIA API key as a Kubernetes secret.
  • Enable your cluster to run NIM model containers in KServe.

What this unlocks

After you complete this procedure, you can deploy supported GPU-accelerated models through Model Serving:

  • For AIDB Knowledge Bases.
  • For GenAI Builder assistants.
  • For custom model-based applications.

Prerequisites

  • Access to an HCP Kubernetes cluster with appropriate permissions.
  • Administrative access to provision node groups (AWS EKS / GCP GKE / RHOS).
  • NVIDIA API key for accessing NIM models.
  • Familiarity with basic kubectl usage.

Provision GPU nodes

Provision GPU node groups (EKS) or node pools (GKE / RHOS) in your HCP cluster:

  • Use instances with L40S or A100 GPUs (for example, g6e.12xlarge on AWS or a2-highgpu-4g on GCP).
  • Recommended: Provision at least one node with four GPUs to support large models such as Llama 70B.

Label and taint GPU nodes

Apply the following Kubernetes label and taint to GPU nodes.

Label:

kubectl label node <gpu-node-name> nvidia.com/gpu=true

Taint:

kubectl taint nodes <gpu-node-name> nvidia.com/gpu=true:NoSchedule

This ensures that KServe model pods are scheduled correctly and that Postgres clusters do not land on GPU nodes.

Deploy the NVIDIA device plugin

Deploy the NVIDIA device plugin DaemonSet.

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.1/nvidia-device-plugin.yml

Verify that the plugin is running.

kubectl get ds -n kube-system nvidia-device-plugin-daemonset

The plugin exposes GPU resources to Kubernetes and KServe.

Store NVIDIA API key as Kubernetes secret

Generate an NVIDIA API key from the NGC Catalog portal.

Create a Kubernetes secret.

kubectl create secret generic nvidia-nim-secrets --from-literal=NGC_API_KEY=<your_NVIDIA_API_KEY>

This secret is required when deploying ClusterServingRuntime resources for NIM models.

Next steps


Could this page be better? Report a problem or suggest an addition!