How Model Serving Deployment Works
AI Factory makes it easy to deploy your AI models as scalable, production-ready inference services. The platform uses KServe as the model serving engine, operating within the Hybrid Manager (HCP) Kubernetes infrastructure.
This page explains the general flow of model deployment and links to key how-to guides for hands-on instructions.
Deployment flow overview
- You deploy models by creating KServe InferenceServices in your HCP project.
- AI Factory provides GPU-enabled Kubernetes infrastructure to run these services.
- You can deploy supported NVIDIA NIM containers or other compatible models.
- The Model Library helps you discover and manage model images.
- Applications access model endpoints over HTTP or gRPC APIs.
Deployment components
KServe InferenceService
Each model is deployed via a Kubernetes-native InferenceService
object:
- Manages lifecycle of the model server pods.
- Handles scaling, health checks, and routing.
- Exposes a network endpoint for model consumption.
ClusterServingRuntime
Advanced users can also configure ClusterServingRuntime
resources to customize runtime environments for their models.
Where to start
If you're ready to deploy models, follow these guides:
- Deploy NIM Containers
- Create InferenceService
- Configure ServingRuntime
- Monitor InferenceService
- Update GPU Resources
Hybrid Manager integration
Model Serving runs on Hybrid Manager (HCP) Kubernetes clusters. For more on Hybrid Manager and GPU setup:
Best practices
- Use the Model Library to select supported models.
- Verify that your cluster has sufficient GPU resources.
- Monitor deployed models to ensure performance and availability.
- Use
ClusterServingRuntime
where advanced customization is needed.
Next steps
- Explore our Model Serving How-To Guides
- Review supported models in the Supported Models Index
- Learn about Observability for Model Serving
By following this deployment flow, you can run AI models in production with full observability and scale — directly integrated with the broader AI Factory and Hybrid Manager ecosystem.
Could this page be better? Report a problem or suggest an addition!