How to Deploy AI Models from Model Library

How to Deploy AI Models from Model Library

This guide explains how to deploy AI models from the AI Factory Model Library into Model Serving (powered by KServe) in your Hybrid Control Plane (HCP) environment.

Once deployed, these models power key AI Factory features:

  • Knowledge Bases (via AIDB pipelines)
  • Gen AI Builder Assistants and pipelines
  • Other AI Factory and application integrations

Who should use this guide?

  • AI platform admins deploying validated model images
  • Data engineers configuring AI models for Knowledge Bases
  • AI application developers configuring models for Assistants

What this enables

Once deployed:

  • Your AI models are available in Model Serving.
  • You can link them to Knowledge Bases or Gen AI Builder pipelines.
  • You can monitor and manage deployed models via the HCP Model Serving UI or Kubernetes.

Estimated time to complete

10–20 minutes per model, depending on model size and cluster resources.

Prerequisites

Before you begin:

  • An active HCP environment with GPU worker nodes configured.
  • NVIDIA NGC API key stored as nvidia-nim-secrets in Kubernetes.
  • The model image you want to deploy must be visible in the Model Library.
  • KServe must be configured and ready in your cluster.

→ For a full setup guide, see: Setup GPU Resources for Model Serving

Steps to deploy an AI model

1. Browse and select model in Model Library

  • Go to AI Factory > Model Library.
  • Browse available model images.
  • Review versions and tags.
  • Select the version you want to deploy.

2. Configure and deploy model

  • Click Deploy or Deploy to Model Serving.

  • Configure deployment parameters:

    • Number of replicas
    • Resource requests/limits (GPU, CPU, memory)
    • Model runtime settings (if needed)
  • Confirm and deploy.

This triggers creation of:

  • A ClusterServingRuntime for the model (if not already defined).
  • An InferenceService for the specific deployment.

3. Verify deployed model

You can verify your deployed models using:

Model Serving UI in HCP

  • Go to AI Factory > Model Serving (or Hybrid Manager > AI Factory > Model Serving).
  • Confirm model appears with status Ready.

Or use kubectl:

kubectl get InferenceService -n default
Output
NAME STATUS URL GPUs
meta-nim-llama-3-3-nemotron-super-49b Ready http://meta-nim-llama-3-3-nemotron-super-49b-predictor... 4
nim-snowflake-arctic-embed-l Ready http://nim-snowflake-arctic-embed-l-predictor... 1
...

4. Connect model to AI Factory workloads

Once the model is Ready:

  • You can select it in:

    • Knowledge Base pipelines (for embedding or reranking)
    • Gen AI Builder pipelines
    • Assistant configurations

→ The UI will show models available for each use case based on their type (Embedding, Completion, Reranking, etc.).

Supported model types

Model TypeExample Image
Text Completionllama-3.3-nemotron-super-49b
Text Embeddingarctic-embed-l
Image Embeddingnvclip
OCRpaddleocr
Text Rerankerllama-3.2-nv-rerankqa-1b-v2

Tips & Best Practices

  • GPU placement: Ensure your model matches your GPU capacity. Large models like llama-3.3-49b require multiple GPUs on a single node.
  • Quota management: Limit number of large models deployed simultaneously to avoid overloading GPU nodes.
  • Version testing: Test new model versions in isolated deployments before promoting to production pipelines or Assistants.

Troubleshooting

Model stuck in Pending

  • Check GPU node taints/labels.
  • Verify InferenceService tolerations and nodeSelectors match.

Model not appearing in Model Library

  • Confirm image is correctly tagged and synced via Image and Model Library.
  • Verify repository rules if using private registry.

Kubernetes errors on deploy

  • Check kubectl describe InferenceService <model> for detailed error logs.

Summary

  • You can deploy AI models from the AI Factory Model Library.
  • Deployed models run via KServe Model Serving.
  • Deployed models power Knowledge Bases and Gen AI Builder Assistants.
  • The deployment flow ensures consistent governance and visibility.

Could this page be better? Report a problem or suggest an addition!