Verify InferenceServices and GPU Usage in Hybrid Manager

Verify InferenceServices and GPU Usage in Hybrid Manager

Use this guide to confirm the correct deployment and operational status of InferenceServices and GPU resource usage within your Hybrid Manager (HCP) Kubernetes cluster.

Goal

Ensure your deployed InferenceServices are correctly utilizing GPU resources.

Estimated time

15–20 minutes.

What you accomplish

  • Verify the status of deployed InferenceServices.
  • Confirm GPU resource allocation and utilization.
  • Troubleshoot common deployment and GPU-related issues.

Prerequisites

  • Completed setup and deployment of GPU resources and NIM containers in Hybrid Manager.
  • Access to HCP Kubernetes cluster with appropriate permissions.
  • Familiarity with basic kubectl usage.

Verify InferenceServices Status

Check the status of your deployed InferenceServices to confirm they are operational.

kubectl get inferenceservice -n <namespace>

Look for the READY status to ensure your service is successfully running.

Confirm GPU Resource Usage

Check GPU resource allocation and usage on nodes.

kubectl describe nodes | grep nvidia.com/gpu

Review the output to verify GPU availability and allocation.

Use nvidia-smi from within your GPU-enabled pods to check real-time GPU utilization.

kubectl exec -n <namespace> -it <pod-name> -- nvidia-smi

Troubleshoot Common Issues

If the InferenceService is not ready or GPU resources are not properly allocated:

  • Confirm the NVIDIA device plugin DaemonSet is running:
kubectl get ds -n kube-system nvidia-device-plugin-daemonset
  • Check for resource constraints or scheduling issues using:
kubectl describe pods -n <namespace>

Address any errors or issues reported by these commands.

Next steps

  • Optimize GPU resource utilization and scaling.
  • Monitor model performance and health.

Could this page be better? Report a problem or suggest an addition!