How to Deploy AI Models from Model Library

This guide explains how to deploy AI models from the AI Factory Model Library into Model Serving (powered by KServe) in your Hybrid Manager (HCP) environment.

Once deployed, these models power key AI Factory features:

Knowledge Bases (via AIDB pipelines)
Gen AI Builder Assistants and pipelines
Other AI Factory and application integrations

Who should use this guide?

AI platform admins deploying validated model images
Data engineers configuring AI models for Knowledge Bases
AI application developers configuring models for Assistants

What this enables

Once deployed:

Your AI models are available in Model Serving.
You can link them to Knowledge Bases or Gen AI Builder pipelines.
You can monitor and manage deployed models via the HCP Model Serving UI or Kubernetes.

Estimated time to complete

10–20 minutes per model, depending on model size and cluster resources.

Prerequisites

Before you begin:

An active HCP environment with GPU worker nodes configured.
build.nvidia.com API key stored as nvidia-nim-secrets in Kubernetes.
The model image you want to deploy must be visible in the Model Library.
KServe must be configured and ready in your cluster.

→ For a full setup guide, see: Setup GPU Resources for Model Serving

Steps to deploy an AI model

1. Browse and select model in Model Library

Go to AI Factory > Model Library.
Browse available model images.
Review versions and tags.
Select the version you want to deploy.

2. Configure and deploy model

Click Deploy or Deploy to Model Serving.
Configure deployment parameters:
- Number of replicas
- Resource requests/limits (GPU, CPU, memory)
- Model runtime settings (if needed)
Confirm and deploy.

This triggers creation of:

A ClusterServingRuntime for the model (if not already defined).
An InferenceService for the specific deployment.

3. Verify deployed model

You can verify your deployed models using:

Model Serving UI in HCP

Go to AI Factory > Model Serving (or Hybrid Manager > AI Factory > Model Serving).
Confirm model appears with status Ready.

Or use kubectl:

kubectl get InferenceService -n default

Output

NAME STATUS URL GPUs
meta-nim-llama-3-3-nemotron-super-49b Ready http://meta-nim-llama-3-3-nemotron-super-49b-predictor... 4
nim-snowflake-arctic-embed-l Ready http://nim-snowflake-arctic-embed-l-predictor... 1
...

4. Connect model to AI Factory workloads

Once the model is Ready:

You can select it in:
- Knowledge Base pipelines (for embedding or reranking)
- Gen AI Builder pipelines
- Assistant configurations

→ The UI will show models available for each use case based on their type (Embedding, Completion, Reranking, etc.).

Supported model types

Model Type	Example Image
Text Completion	llama-3.3-nemotron-super-49b
Text Embedding	arctic-embed-l
Image Embedding	nvclip
OCR	paddleocr
Text Reranker	llama-3.2-nv-rerankqa-1b-v2

Tips & Best Practices

GPU placement: Ensure your model matches your GPU capacity. Large models like llama-3.3-49b require multiple GPUs on a single node.
Quota management: Limit number of large models deployed simultaneously to avoid overloading GPU nodes.
Version testing: Test new model versions in isolated deployments before promoting to production pipelines or Assistants.

Troubleshooting

Model stuck in Pending

Check GPU node taints/labels.
Verify InferenceService tolerations and nodeSelectors match.

Model not appearing in Model Library

Confirm image is correctly tagged and synced via Image and Model Library.
Verify repository rules if using private registry.

Kubernetes errors on deploy

Check kubectl describe InferenceService <model> for detailed error logs.

Summary

You can deploy AI models from the AI Factory Model Library.
Deployed models run via KServe Model Serving.
Deployed models power Knowledge Bases and Gen AI Builder Assistants.
The deployment flow ensures consistent governance and visibility.

On this page
How to Deploy AI Models from Model Library

← Prev

Define Repository Rules for Model Library

↑ Up

Model Library How-To Guides

Integrate Private Registry with Model Library

Could this page be better? Report a problem or suggest an addition!

How to Deploy AI Models from Model Library

How to Deploy AI Models from Model Library

Who should use this guide?

What this enables

Estimated time to complete

Prerequisites

Steps to deploy an AI model

1. Browse and select model in Model Library

2. Configure and deploy model

3. Verify deployed model

4. Connect model to AI Factory workloads

Supported model types

Tips & Best Practices

Troubleshooting

Model stuck in Pending

Model not appearing in Model Library

Kubernetes errors on deploy

Summary

Related Links

← Prev

↑ Up

Next →