Model Serving FAQ

Model Serving FAQ

Here are answers to common questions about using Model Serving with AI Factory.

What models are currently supported?

AI Factory Model Serving (version 1.2) supports deploying models packaged as NVIDIA NIM containers using KServe.

Support for deploying additional model types and formats will be expanded in future releases.

What Kubernetes resources are used for serving models?

Models are served using KServe InferenceServices, which manage the lifecycle of model-serving pods.

Supporting KServe resources such as ServingRuntimes and related GPU-configured Kubernetes nodes are also used.

What hardware is required?

NIM-based model serving requires GPU-enabled Kubernetes nodes.

Refer to the GPU setup guide for supported configurations and node recommendations by cloud provider.

Can I deploy custom models?

At this time, AI Factory Model Serving is optimized for NIM-based containers. Support for deploying arbitrary models via KServe is planned, but requires manual configuration outside of the current AI Factory streamlined flow.

How is scaling handled?

KServe InferenceServices use Kubernetes-native scaling, with auto-scaling supported for compatible runtimes.

Scaling behavior can be configured in the InferenceService specification (replicas, resources).

How do I monitor running models?

You can monitor deployed models using:

Where can I learn more about the model serving architecture?

See:


Have more questions? Please contact your AI Factory administrator or consult AI Factory Support Resources.


Could this page be better? Report a problem or suggest an addition!