Agent Factory is the AI and machine learning workload in Hybrid Manager. It lets you deploy and serve NVIDIA NIM models on your own GPU infrastructure, build AI flows and agents with Langflow, create knowledge bases and RAG pipelines with Pipeline Designer, and expose model endpoints to your applications, all within your sovereign Kubernetes environment.
This page walks you through the fastest path to get something running.
What you'll build
Agent Factory on Hybrid Manager supports two starting paths:
- Building an AI flow in Langflow backed by a knowledge base.
- Creating a knowledge base pipeline through Pipeline Designer.
Both paths require a deployed model cluster, start there if you haven't already.
Before you start
Complete these steps in order before following either path below. Agent Factory must be enabled and a model cluster must be running before Langflow and Pipeline Designer can use models.
- Enable Agent Factory in your project: Prerequisites
- Provision GPU node groups: GPU recommendations
- Deploy a model cluster from the Model Library: Deploy with HM UI
Option A — Build an AI flow with Langflow
Langflow is a visual flow builder that ships with HM. Use it to wire together EDB components — knowledge bases, embedding models, LLMs, and Postgres databases — into AI flows and agents you can run interactively or deploy as callable services.
- Langflow quickstart — end-to-end walkthrough: knowledge base setup, component wiring, and running the flow in the playground.
Option B — Create a knowledge base pipeline
Pipeline Designer is a no-code wizard in the HM console for building AIDB data pipelines. Use it to ingest documents, generate embeddings, and store them in a vector knowledge base ready for semantic search or RAG.
Access model endpoints from your applications
Any model deployed through HM exposes an OpenAI-compatible REST endpoint. You can call it directly from applications, scripts, or external tools without going through Langflow or Pipeline Designer.
Monitor and iterate
Once your models and flows are running, use HM's built-in observability to track inference latency, GPU utilization, and flow health.