Model Serving

Model Serving in EDB Postgres® AI (EDB PG AI) enables you to deploy and scale AI model inference services — turning AI models into production-ready APIs and services.

Important: Model Serving requires Hybrid Manager and the Asset Library to manage model images. Serving is performed within your Hybrid Manager Kubernetes environment.

What Is Model Serving?

Model Serving lets you take AI models — including large language models (LLMs), vision models, and embedding models — and deploy them as scalable, observable Inference Services:

  • In-cluster or on-prem, controlled within your environment
  • Integrated with EDB PG AI Pipelines, Knowledge Bases, and Gen AI Builder
  • With full control over model lifecycle, scaling, and usage

EDB PG AI Model Serving is powered by the open source KServe engine and deeply integrated into Hybrid Manager.

Why Use Model Serving?

Model Serving provides the backbone for operationalizing AI models:

  • Sovereignty — Run models on your infrastructure, not external APIs
  • Governance — Ensure models used in production are trusted and auditable
  • Scalability — Automatically scale inference services based on load
  • Observability — Monitor and troubleshoot model serving behavior
  • Integration — Seamlessly connect Pipelines, Knowledge Bases, and Gen AI assistants to served models

How It Works

Model Serving works in combination with Hybrid Manager and the Model Library:

  • You register model images in the Asset Library
  • You create InferenceServices via Hybrid Manager or the API
  • Hybrid Manager deploys the model as a containerized service using KServe
  • EDB PG AI components (such as Pipelines, Gen AI Builder, and Knowledge Bases) connect to these InferenceServices at runtime

InferenceServices can support:

  • Open embedding models for Pipelines and RAG
  • Text generation models (LLMs) for Gen AI
  • Vision models (CLIP, OCR, image embedding) for multi-modal Knowledge Bases

When to Use

You should use Model Serving when:

  • You want to run your own models, not rely on external API calls
  • You need to ensure model inference runs where your data resides (Sovereign AI)
  • You want to manage model lifecycle centrally across Pipelines, Knowledge Bases, and assistants
  • You need scalable, observable model serving integrated with Hybrid Manager

Learn More


With Model Serving, EDB PG AI gives you the tools to operationalize AI models with full control, scalability, and integration — powering intelligent applications on top of your Postgres and data platforms.



Could this page be better? Report a problem or suggest an addition!