Purpose and Benefits
Model management within Hybrid Manager provides centralized governance and deployment capabilities for AI models running on your Kubernetes infrastructure. This system enables organizations to maintain complete control over their AI capabilities while leveraging enterprise-grade Model Serving infrastructure.
The integration addresses critical requirements for organizations deploying AI at scale: model governance through approved registries, scalable inference serving with GPU acceleration, and unified management through Hybrid Manager's control plane. By running models within your controlled infrastructure, you maintain data sovereignty while accessing state-of-the-art AI capabilities.
Core Concepts
Model Library
The Model Library serves as your centralized governance system for AI model images. Operating within Hybrid Manager's Asset Library infrastructure, it provides a curated view of validated models ready for production deployment.
The library implements multi-stage governance:
- Automated synchronization from trusted container registries
- Security scanning and vulnerability assessment
- Approval workflows based on organizational policies
- Metadata management for versioning and documentation
Models in the library power all Agent Factory capabilities including Langflow flows, Pipeline Designer knowledge bases, and custom inference applications. Only models validated through the library's governance framework can reach production environments.
Model Serving
Model Serving transforms approved models into scalable inference endpoints using KServe within your Kubernetes clusters. This infrastructure provides production-grade model deployment with automatic scaling, health management, and resource optimization.
Key serving capabilities include:
- InferenceService resources that define deployed model endpoints
- ServingRuntime configurations optimized for different model frameworks
- GPU allocation and scheduling for high-performance inference — see GPU recommendations
- Internal and external endpoint access with authentication
Management Interface
Hybrid Manager provides unified management through its web console, abstracting Kubernetes complexity while maintaining full configurability. The interface enables:
- Visual workflows for model deployment from library to serving
- Resource allocation and scaling configuration
- Monitoring dashboards for inference metrics and GPU utilization
- Access control and endpoint management
Implementation Workflow
Model Registration
Organizations begin by configuring repository connections to trusted model sources. The Model Library synchronizes with external registries based on defined rules, automatically discovering and validating new model versions.
External Registry → Repository Rules → Security Scanning → Model Library
Repository rules determine which models enter your environment, implementing organizational policies at the point of ingestion. This automated approach reduces manual overhead while maintaining governance standards.
Model Deployment
Validated models deploy through guided workflows that configure serving infrastructure:
- Model Selection: Browse available models in the library with metadata including version, performance characteristics, and resource requirements
- Runtime Configuration: Select or create ServingRuntimes optimized for the model framework (vLLM, TensorRT-LLM, custom)
- Resource Allocation: Define GPU, memory, and CPU requirements based on expected workload
- Endpoint Configuration: Set up internal cluster access or external API endpoints with authentication (see Access KServe endpoints)
The system creates InferenceService resources that KServe manages, handling pod scheduling, health monitoring, and traffic routing automatically.
Operational Management
Deployed models operate under continuous monitoring with automatic scaling based on demand. Hybrid Manager provides visibility through:
- Real-time inference metrics including latency and throughput
- GPU utilization tracking for resource optimization
- Error rates and health status for proactive maintenance
- Cost analysis based on resource consumption
Using deployed models
Once a model cluster is running, there are three ways to consume it.
From applications
Applications call deployed model endpoints directly via KServe InferenceServices. Each endpoint exposes a standard OpenAI-compatible REST API for chat completions, embeddings, or other inference tasks.
From AIDB (SQL patterns)
AIDB lets you call models from SQL, making them available directly inside Postgres — useful for embedding pipelines or enabling in-database inference.
Hybrid Manager specifics
When models are deployed through Hybrid Manager:
- Service URLs. Each model is exposed as an internal KServe endpoint within your HM project. The URL is visible in the Model Library or the Model Serving details page.
- Authentication. Endpoints are protected by the platform. Applications running inside the same project can reach them directly. For external access, configure authentication using the HM ingress and project-scoped credentials.
- Observability. Requests and logs flow into HM observability, giving you usage metrics, latency, and error tracking.