Model Serving FAQ
Model Serving FAQ
Here are answers to common questions about using Model Serving with AI Factory.
What models are currently supported?
AI Factory Model Serving (version 1.2) supports deploying models packaged as NVIDIA NIM containers using KServe.
Support for deploying additional model types and formats will be expanded in future releases.
What Kubernetes resources are used for serving models?
Models are served using KServe InferenceServices, which manage the lifecycle of model-serving pods.
Supporting KServe resources such as ServingRuntimes and related GPU-configured Kubernetes nodes are also used.
What hardware is required?
NIM-based model serving requires GPU-enabled Kubernetes nodes.
Refer to the GPU setup guide for supported configurations and node recommendations by cloud provider.
Can I deploy custom models?
At this time, AI Factory Model Serving is optimized for NIM-based containers. Support for deploying arbitrary models via KServe is planned, but requires manual configuration outside of the current AI Factory streamlined flow.
How is scaling handled?
KServe InferenceServices use Kubernetes-native scaling, with auto-scaling supported for compatible runtimes.
Scaling behavior can be configured in the InferenceService specification (replicas, resources).
How do I monitor running models?
You can monitor deployed models using:
- KServe InferenceService status
- Kubernetes monitoring tools (Prometheus, Grafana, etc.)
- Model Serving Monitoring Guide
Where can I learn more about the model serving architecture?
See:
Have more questions? Please contact your AI Factory administrator or consult AI Factory Support Resources.
- On this page
- Model Serving FAQ
← Prev
Deploy an NVIDIA NIM container with KServe
↑ Up
Model Serving How-To Guides
Next →
Monitor deployed models with KServe
Could this page be better? Report a problem or suggest an addition!