Model Serving Quickstart

Model Serving Quickstart

This page helps you quickly understand how to start using Model Serving within the AI Factory and where to find supporting documentation.

Model Serving in AI Factory enables you to deploy AI models (such as NVIDIA NIM containers) as scalable, production-grade inference services. It is powered by KServe, a Kubernetes-native model serving engine.

Where to start

1. Learn the concepts

Before deploying models, it's useful to understand how Model Serving works and how it fits into the AI Factory ecosystem:

2. Understand how AI Factory integrates Model Serving

Model Serving interacts with:

  • Model Library: Browse and manage model images for deployment (coming soon)
  • Knowledge Bases (AIDB): Vector stores that may use embedding models served by Model Serving
  • Gen AI Builder: Applications may call into Model Serving endpoints for inferencing

3. Follow the How-To Guides

If you're ready to deploy or manage models:


Getting started checklist

Use this checklist to guide your progress depending on your experience level.

For new users (101 level)

Follow Learning Path 101 for Model Serving


For existing users familiar with Kubernetes (101 level)

  • Verify your Kubernetes access in your HCP project
  • Review the Concepts and Terminology
  • Prepare your cluster prerequisites:
  • GPU node pools (if needed)
  • NVIDIA device plugin (if needed)
  • Access to your container registry for model images
  • Configure basic KServe resources:

Follow Learning Path 101 for Model Serving


For advanced users (201 level)

  • Tune deployed InferenceService resource usage:
  • Monitor deployed models:
  • Understand traffic routing, canary rollouts, and scaling:

Follow Learning Path 201 for Model Serving


For expert users (301 level)

  • Manage your own custom model images
  • Build and configure custom ServingRuntime definitions
  • Use Transformers and Explainers in KServe (coming soon)
  • Build CI/CD pipelines for deploying models in KServe
  • Instrument InferenceServices for advanced observability

Follow Learning Path 301 for Model Serving


Next steps


Use this quickstart as your launch point into Model Serving within AI Factory.


Could this page be better? Report a problem or suggest an addition!