How to Deploy AI Models from Model Library
How to Deploy AI Models from Model Library
This guide explains how to deploy AI models from the AI Factory Model Library into Model Serving (powered by KServe) in your Hybrid Control Plane (HCP) environment.
Once deployed, these models power key AI Factory features:
- Knowledge Bases (via AIDB pipelines)
- Gen AI Builder Assistants and pipelines
- Other AI Factory and application integrations
Who should use this guide?
- AI platform admins deploying validated model images
- Data engineers configuring AI models for Knowledge Bases
- AI application developers configuring models for Assistants
What this enables
Once deployed:
- Your AI models are available in Model Serving.
- You can link them to Knowledge Bases or Gen AI Builder pipelines.
- You can monitor and manage deployed models via the HCP Model Serving UI or Kubernetes.
Estimated time to complete
10–20 minutes per model, depending on model size and cluster resources.
Prerequisites
Before you begin:
- An active HCP environment with GPU worker nodes configured.
- NVIDIA NGC API key stored as
nvidia-nim-secrets
in Kubernetes. - The model image you want to deploy must be visible in the Model Library.
- KServe must be configured and ready in your cluster.
→ For a full setup guide, see: Setup GPU Resources for Model Serving
Steps to deploy an AI model
1. Browse and select model in Model Library
- Go to AI Factory > Model Library.
- Browse available model images.
- Review versions and tags.
- Select the version you want to deploy.
2. Configure and deploy model
Click Deploy or Deploy to Model Serving.
Configure deployment parameters:
- Number of replicas
- Resource requests/limits (GPU, CPU, memory)
- Model runtime settings (if needed)
Confirm and deploy.
This triggers creation of:
- A
ClusterServingRuntime
for the model (if not already defined). - An
InferenceService
for the specific deployment.
3. Verify deployed model
You can verify your deployed models using:
Model Serving UI in HCP
- Go to AI Factory > Model Serving (or Hybrid Manager > AI Factory > Model Serving).
- Confirm model appears with status Ready.
Or use kubectl:
kubectl get InferenceService -n default
NAME STATUS URL GPUs meta-nim-llama-3-3-nemotron-super-49b Ready http://meta-nim-llama-3-3-nemotron-super-49b-predictor... 4 nim-snowflake-arctic-embed-l Ready http://nim-snowflake-arctic-embed-l-predictor... 1 ...
4. Connect model to AI Factory workloads
Once the model is Ready:
You can select it in:
- Knowledge Base pipelines (for embedding or reranking)
- Gen AI Builder pipelines
- Assistant configurations
→ The UI will show models available for each use case based on their type (Embedding, Completion, Reranking, etc.).
Supported model types
Model Type | Example Image |
---|---|
Text Completion | llama-3.3-nemotron-super-49b |
Text Embedding | arctic-embed-l |
Image Embedding | nvclip |
OCR | paddleocr |
Text Reranker | llama-3.2-nv-rerankqa-1b-v2 |
Tips & Best Practices
- GPU placement: Ensure your model matches your GPU capacity. Large models like llama-3.3-49b require multiple GPUs on a single node.
- Quota management: Limit number of large models deployed simultaneously to avoid overloading GPU nodes.
- Version testing: Test new model versions in isolated deployments before promoting to production pipelines or Assistants.
Troubleshooting
Model stuck in Pending
- Check GPU node taints/labels.
- Verify InferenceService tolerations and nodeSelectors match.
Model not appearing in Model Library
- Confirm image is correctly tagged and synced via Image and Model Library.
- Verify repository rules if using private registry.
Kubernetes errors on deploy
- Check
kubectl describe InferenceService <model>
for detailed error logs.
Summary
- You can deploy AI models from the AI Factory Model Library.
- Deployed models run via KServe Model Serving.
- Deployed models power Knowledge Bases and Gen AI Builder Assistants.
- The deployment flow ensures consistent governance and visibility.
Related Links
- On this page
- How to Deploy AI Models from Model Library
← Prev
Define Repository Rules for Model Library
↑ Up
Model Library How-To Guides
Next →
Integrate Private Registry with Model Library
Could this page be better? Report a problem or suggest an addition!