Model Serving Quickstart
Model Serving Quickstart
This page helps you quickly understand how to start using Model Serving within the AI Factory and where to find supporting documentation.
Model Serving in AI Factory enables you to deploy AI models (such as NVIDIA NIM containers) as scalable, production-grade inference services. It is powered by KServe, a Kubernetes-native model serving engine.
Where to start
1. Learn the concepts
Before deploying models, it's useful to understand how Model Serving works and how it fits into the AI Factory ecosystem:
2. Understand how AI Factory integrates Model Serving
Model Serving interacts with:
- Model Library: Browse and manage model images for deployment (coming soon)
- Knowledge Bases (AIDB): Vector stores that may use embedding models served by Model Serving
- Gen AI Builder: Applications may call into Model Serving endpoints for inferencing
3. Follow the How-To Guides
If you're ready to deploy or manage models:
Getting started checklist
Use this checklist to guide your progress depending on your experience level.
For new users (101 level)
- Read the Model Serving Concepts
- Review key Model Serving Terminology
- Understand What KServe is and how it powers Model Serving
- Understand How Model Library relates to Model Serving (coming soon)
Follow Learning Path 101 for Model Serving
For existing users familiar with Kubernetes (101 level)
- Verify your Kubernetes access in your HCP project
- Review the Concepts and Terminology
- Prepare your cluster prerequisites:
- GPU node pools (if needed)
- NVIDIA device plugin (if needed)
- Access to your container registry for model images
- Configure basic KServe resources:
Follow Learning Path 101 for Model Serving
For advanced users (201 level)
- Tune deployed InferenceService resource usage:
- Monitor deployed models:
- Understand traffic routing, canary rollouts, and scaling:
- Model serving scaling patterns
- Future: Advanced How-Tos
Follow Learning Path 201 for Model Serving
For expert users (301 level)
- Manage your own custom model images
- Build and configure custom ServingRuntime definitions
- Use Transformers and Explainers in KServe (coming soon)
- Build CI/CD pipelines for deploying models in KServe
- Instrument InferenceServices for advanced observability
Follow Learning Path 301 for Model Serving
Next steps
Use this quickstart as your launch point into Model Serving within AI Factory.
- On this page
- Model Serving Quickstart
Could this page be better? Report a problem or suggest an addition!