Model Serving Quickstart

Model Serving Quickstart

This page helps you quickly understand how to start using Model Serving within the AI Factory and where to find supporting documentation.

Model Serving in AI Factory enables you to deploy AI models (such as NVIDIA NIM containers) as scalable, production-grade inference services. It is powered by KServe, a Kubernetes-native model serving engine.

Where to start

1. Learn the concepts

Before deploying models, it's useful to understand how Model Serving works and how it fits into the AI Factory ecosystem:

2. Understand how AI Factory integrates Model Serving

Model Serving interacts with:

Model Library: Browse and manage model images for deployment (coming soon)
Knowledge Bases (AIDB): Vector stores that may use embedding models served by Model Serving
Gen AI Builder: Applications may call into Model Serving endpoints for inferencing

3. Follow the How-To Guides

If you're ready to deploy or manage models:

How-To Guides: Model Serving

Getting started checklist

Use this checklist to guide your progress depending on your experience level.

For new users (101 level)

Read the Model Serving Concepts
Review key Model Serving Terminology
Understand What KServe is and how it powers Model Serving
Understand How Model Library relates to Model Serving (coming soon)

Follow Learning Path 101 for Model Serving

For existing users familiar with Kubernetes (101 level)

Verify your Kubernetes access in your HCP project
Review the Concepts and Terminology
Prepare your cluster prerequisites:

GPU node pools (if needed)
NVIDIA device plugin (if needed)
Access to your container registry for model images

Configure basic KServe resources:

Follow Learning Path 101 for Model Serving

For advanced users (201 level)

Tune deployed InferenceService resource usage:

Update GPU Resources

Monitor deployed models:

List deployed InferenceServices
Monitor KServe deployments (coming soon)

Understand traffic routing, canary rollouts, and scaling:

Model serving scaling patterns
Future: Advanced How-Tos

Follow Learning Path 201 for Model Serving

For expert users (301 level)

Manage your own custom model images
Build and configure custom ServingRuntime definitions
Use Transformers and Explainers in KServe (coming soon)
Build CI/CD pipelines for deploying models in KServe
Instrument InferenceServices for advanced observability

Follow Learning Path 301 for Model Serving

Next steps

Use this quickstart as your launch point into Model Serving within AI Factory.

On this page
Model Serving Quickstart

← Prev

Reranking (NIM)

↑ Up

Pipelines models

Next →

Using models with...

Could this page be better? Report a problem or suggest an addition!