Model Serving in Hybrid Manager

Model Serving in Hybrid Manager provides a scalable, Kubernetes-native way to serve AI models as production-grade inference services.

It is implemented using KServe and runs on GPU-enabled nodes in your Hybrid Manager project’s Kubernetes cluster. Model Serving enables Gen AI applications, Knowledge Bases, and custom pipelines to use high-performance models under your control.

How Model Serving fits in the Hybrid Manager architecture

Model Serving is a core capability of Hybrid Manager’s AI Factory workload:

  • Models are deployed as KServe InferenceServices within the project’s Kubernetes cluster.
  • Model Serving is powered by GPU-enabled infrastructure that you provision and manage.
  • Model images come from the Asset Library (formerly Model Library), backed by Hybrid Manager’s image governance.
  • Model endpoints (HTTP/gRPC) are available to:
  • Gen AI Builder Assistants.
  • AIDB Knowledge Bases.
  • External applications and APIs.

Model Serving in Hybrid Manager ensures that all model serving is governed, auditable, and runs securely within your infrastructure — enabling Sovereign AI patterns.

How it works in Hybrid Manager

  • KServe is installed and managed by Hybrid Manager within your project’s Kubernetes cluster.
  • You must provision GPU node groups or node pools to support high-performance model serving.
  • GPU nodes must be correctly labeled and configured to support KServe workloads.
  • Models are deployed from the Asset Library via ClusterServingRuntime and InferenceService definitions.
  • Your applications and AI Factory workloads can invoke model endpoints via REST or gRPC.

Key Hybrid Manager considerations

  • GPU infrastructure is required for most advanced models, such as LLMs, embeddings, and vision models.
  • Hybrid Manager enables full observability of model serving, including Prometheus metrics and Kubernetes-native monitoring.
  • Model serving endpoints are secured and managed within your Hybrid Manager project scope.
  • Governance for model images and deployment comes from Hybrid Manager’s integrated Asset Library and image controls.

Typical use cases

  • Power Gen AI Builder Assistants with LLM or embedding models.
  • Enable AIDB Knowledge Bases with GPU-accelerated embedding pipelines.
  • Serve image models (OCR, vision) as part of multi-modal retrieval systems.
  • Expose enterprise-grade model APIs to downstream applications.

Could this page be better? Report a problem or suggest an addition!