Model Serving in Hybrid Manager

Suggest edits

Model Serving in Hybrid Manager provides a scalable, Kubernetes-native way to serve AI models as production-grade inference services.

It is implemented using KServe and runs on GPU-enabled nodes in your Hybrid Manager project’s Kubernetes cluster. Model Serving enables Gen AI applications, Knowledge Bases, and custom pipelines to use high-performance models under your control.

How Model Serving fits in the Hybrid Manager architecture

Model Serving is a core capability of Hybrid Manager’s AI Factory workload:

Models are deployed as KServe InferenceServices within the project’s Kubernetes cluster.
Model Serving is powered by GPU-enabled infrastructure that you provision and manage.
Model images come from the Asset Library (formerly Model Library), backed by Hybrid Manager’s image governance.
Model endpoints (HTTP/gRPC) are available to:
Gen AI Builder Assistants.
AIDB Knowledge Bases.
External applications and APIs.

Model Serving in Hybrid Manager ensures that all model serving is governed, auditable, and runs securely within your infrastructure — enabling Sovereign AI patterns.

How it works in Hybrid Manager

KServe is installed and managed by Hybrid Manager within your project’s Kubernetes cluster.
You must provision GPU node groups or node pools to support high-performance model serving.
GPU nodes must be correctly labeled and configured to support KServe workloads.
Models are deployed from the Asset Library via ClusterServingRuntime and InferenceService definitions.
Your applications and AI Factory workloads can invoke model endpoints via REST or gRPC.

Key Hybrid Manager considerations

GPU infrastructure is required for most advanced models, such as LLMs, embeddings, and vision models.
Hybrid Manager enables full observability of model serving, including Prometheus metrics and Kubernetes-native monitoring.
Model serving endpoints are secured and managed within your Hybrid Manager project scope.
Governance for model images and deployment comes from Hybrid Manager’s integrated Asset Library and image controls.

Typical use cases

Power Gen AI Builder Assistants with LLM or embedding models.
Enable AIDB Knowledge Bases with GPU-accelerated embedding pipelines.
Serve image models (OCR, vision) as part of multi-modal retrieval systems.
Expose enterprise-grade model APIs to downstream applications.

Links to learn more

← Prev

Asset Library in Hybrid Manager

↑ Up

Model capabilities in Hybrid Manager

Analytics in Hybrid Manager

Could this page be better? Report a problem or suggest an addition!