Context
An external inference service connects Hybrid Manager (HM) to a remote model provider hosted outside the cluster — such as OpenAI, Google Gemini, Anthropic, or NVIDIA NIM. You configure the provider's URL, model name, and API key once; HM stores the credentials in Kubernetes secrets and handles authentication transparently for every downstream request.
This page covers:
- Viewing the Inference Services list
- Registering a new external inference service
- Getting the details of a registered service
- Updating a registered service
- Deregistering a service
Inference Services list
Open the Inference Services list page from the Estate → Inference Services menu in your HM project.
The list displays each service's name, model, and current status. When the status isn't Ready, a short Status Message explains the cause.
Status
| Status | Meaning |
|---|---|
| Ready | The service is healthy and accepting requests. |
| Failed | The most recent health check failed. The service detail page shows a Status Message with the diagnosis (for example, API key is missing or invalid, endpoint unreachable, or account quota exhausted). |
| Pending | The endpoint hasn't yet been verified by the background health check. Typically seen during the first few minutes after an HM restart, while the cache warms up. The Status Message reads Remote connection check is pending — upstream has not yet been verified. |
| Unknown | The underlying Kubernetes resources haven't reported a status yet — usually a transient state right after creation. |
Status is refreshed every 5 minutes by a background health check. After you create or edit a service, the connectivity probe runs inline so the result is reflected immediately.
Register an external inference service
Navigate to Estate → Quick Actions → Register External Inference Service in your HM project.
Tip
You can also reach the form from the Inference Services list page via the Quick Actions menu.
Prerequisites
Before registering, confirm you have:
- The provider's base URL (scheme and host, without a trailing
/v1). - The model name exactly as the provider expects it (case-sensitive).
- A valid API key for the provider.
- Network reachability from the HM cluster to the upstream hostname. Your HM administrator may need to allow egress to the provider's domain.
Quick reference by provider
Use this table to fill in the form for the most common providers. The columns map directly to the fields described below.
| Provider | Model base URL | API protocol version | Example model name | Notes |
|---|---|---|---|---|
| OpenAI | https://api.openai.com | OPENAI_V1 | gpt-4o-mini | API key validated at registration. |
| OpenRouter | https://openrouter.ai/api | OPENAI_V1 | openai/gpt-4o-mini | API key not validated at registration. |
| Google Gemini | https://generativelanguage.googleapis.com | GEMINI_V1_BETA | gemini-2.5-pro | API key validated at registration. |
| Anthropic | https://api.anthropic.com | ANTHROPIC_V1 | claude-sonnet-4-5 | API key validated at registration. |
| NVIDIA NIM | https://integrate.api.nvidia.com | OPENAI_V1 | meta/llama-3.1-8b-instruct | API key not validated at registration. |
| Self-hosted / vLLM | Your internal service URL, for example http://vllm-svc.inference:8000 | OPENAI_V1 | The model name your vLLM server is serving | API key not validated at registration when vLLM runs unauthenticated. |
| Other HM (federation) | Another HM's external inference endpoint URL | OPENAI_V1 | Same model name as on the upstream HM | Enable Allow Insecure Connection. See Allow Insecure Connection below. |
For providers not listed here, see the form field descriptions below.
Form fields
External Service Name (required)
A unique identifier for this service within HM. Must follow DNS-style naming rules:
- Lowercase letters and digits only.
- Hyphens (
-) are allowed within segments but not at the start or end. - Dots (
.) are allowed as segment separators. - No uppercase letters, underscores, or spaces.
- Maximum 63 characters.
Example: openai-gpt-4o-mini, azure.gpt-4o.prod.
Tags (optional)
Reuse existing HM tags to group and filter services. Tags have no effect on request routing or authentication.
Model name (required)
The exact identifier the upstream provider expects, as documented by the provider. This value is case-sensitive. Slash-separated names (such as meta/llama-3.1-8b-instruct or openai/gpt-4o-mini) are supported. See the Quick reference by provider table above for examples.
API Key (required for most providers)
The API key only — don't include the Authorization: Bearer … prefix. HM adds the correct auth header automatically based on the API Protocol Version you select.
Model Base URL (required)
The scheme and host (plus any required path prefix) for the provider's API. See the Quick reference by provider table above for the correct value for common providers.
Don't include /v1 — consumer applications append /v1 (or /v1beta for Gemini) themselves. Including /v1 here causes duplicated paths such as /v1/v1/chat/completions, which returns a 404. A trailing slash is tolerated and stripped automatically.
Tip
Self-hosted vLLM servers return 404 on the per-model endpoint /v1/models/{name}, which normally causes the registration probe to fail. HM handles this by falling back to GET /v1/models and matching the model name in the list, so vLLM-served models register successfully.
Functions (optional, multi-select)
Capability tags that consumer applications filter on when discovering available models. Use the predefined values below for HM's built-in consumers; for your own applications, any string is valid.
| Built-in consumer | Required function tag |
|---|---|
| HM chatbot | chatbot-gen-content |
| AIDB pipeline step | The matching aidb-* tag (see your AIDB pipeline documentation) |
Leave this field empty if you're exposing the service exclusively to custom applications that perform their own model selection.
API Protocol Version (required)
Controls both the request body format and the outbound authentication header. Choose the option that matches the provider's native API.
| Option | Request body shape | Auth header sent | Use for |
|---|---|---|---|
OPENAI_V1 | OpenAI Chat Completions | Authorization: Bearer <key> | OpenAI, NVIDIA NIM, vLLM, OpenRouter, any OpenAI-compatible endpoint |
GEMINI_V1_BETA | Google Gemini | x-goog-api-key: <key> | Google Gemini native API only |
ANTHROPIC_V1 | Anthropic Messages | x-api-key: <key> + anthropic-version: 2023-06-01 | Anthropic Claude |
Allow Insecure Connection (optional, default off)
Disables TLS certificate verification on outbound calls to the upstream. Enable this only if the upstream uses a self-signed certificate or a certificate signed by a CA not trusted by the HM cluster.
Use this toggle when registering another HM's external inference endpoint (for example, HM2 federating through HM1). HM1's ingress typically presents a self-signed certificate, which causes TLS errors for both the registration probe and runtime forwarding. HM-to-HM federation also requires both ends to use OPENAI_V1 as the API Protocol Version.
Warning
This setting is create-only. You can't toggle it after registration. If you need to change it, delete the service and re-register. Only enable this for development environments or trusted self-signed certificates — disabling TLS verification reduces security.
After clicking Register
HM validates the endpoint before creating any infrastructure. For OpenAI (OPENAI_V1), Anthropic (ANTHROPIC_V1), and Google Gemini (GEMINI_V1_BETA), HM performs a live connectivity probe that checks both reachability and credential validity. If the endpoint is unreachable or the API key is rejected, registration fails immediately with an error — no resources are created.
Note
For some OPENAI_V1 providers — such as NVIDIA NIM, HuggingFace, OpenRouter, and self-hosted vLLM running without authentication — the models endpoint doesn't require an API key. A connectivity probe is still performed, but a wrong API key may still return HTTP 200. Key validity isn't guaranteed at registration time for these providers.
After registration, the status is refreshed every 5 minutes by a background health check. If the upstream becomes unreachable or starts rejecting credentials, the service flips to Failed (with an explanatory Status Message) at the next refresh tick.
Use the service
Once the service is ready, it's available to:
- HM chatbot — The chatbot picks up services tagged with
chatbot-gen-contentautomatically. - Pipeline Designer — Registered external models appear in the model picker alongside HM-hosted models. For details, see External inference services in Pipeline Designer.
- Gen AI Builder — Models are available as inference targets in Gen AI Builder pipelines once registered.
Retrieve inference service details
Click a service name in the Inference Services list to open its detail view, which shows the service's configuration, current status, and available actions.
Details
| Field | Description |
|---|---|
| External Service Name | The unique identifier assigned at registration. |
| Model name | The model identifier forwarded to the upstream provider. |
| Model Base URL | The upstream endpoint the proxy routes requests to. |
| API Protocol Version | The request format and authentication header in use (OPENAI_V1, GEMINI_V1_BETA, or ANTHROPIC_V1). |
| Functions | The capability tags currently assigned to the service. |
| Allow Insecure Connection | Whether TLS certificate verification is disabled for outbound calls. |
| Status | Current health of the service: Ready, Failed, Pending, or Unknown. |
| Status Message | A human-readable explanation that accompanies the status. Empty when the service is Ready; carries the probe diagnosis (for example, API key is missing or invalid (status 401 from ...) or endpoint unreachable: ...) when Failed; carries the cache warm-up text when Pending. |
Note
The API key isn't displayed in the service detail page. To replace it, open Quick Actions → Edit Service and enter a new value. HM rotates the underlying secret automatically.
Troubleshooting from the Status Message
When a service is Failed, the Status Message tells you what to fix. Common patterns:
| Status message | Likely cause | What to do |
|---|---|---|
API key is missing or invalid | Wrong or expired API key. | Open Edit and replace the API Key value. |
API key is missing/invalid or account credits are exhausted (Anthropic) | Anthropic returns the same status for a bad key and for depleted credits. | Verify the key first; if it's correct, top up the Anthropic account. |
account quota exhausted — check billing and credits (OpenAI) | OpenAI billing balance is empty. | Top up the OpenAI account; no change to the HM service is needed. |
API key lacks permission for this endpoint | Key is valid but doesn't have access to the model (project scoping, tier). | Grant model access on the provider's console, or use a different key. |
endpoint not found — check the base URL and model name | Typo in Model Base URL or wrong model name. | Verify both values. Make sure Base URL doesn't end with /v1. |
endpoint unreachable: ... x509: ... | TLS error against an untrusted certificate. | Delete and re-register with Allow Insecure Connection enabled (the toggle is create-only). |
endpoint unreachable: ... (other) | Network egress blocked, DNS failure, or upstream down. | Check egress rules from the HM cluster and confirm the provider is online. |
rate limited by upstream | Provider is throttling requests. | Wait and retry; recurring rate limits indicate the upstream account tier needs upgrading. |
upstream is temporarily unavailable / upstream server error | Provider-side outage. | Wait; the next refresh tick will recover automatically once upstream is healthy. |
Remote connection check is pending — upstream has not yet been verified | Background health check hasn't run yet (typically after an HM restart). | Wait up to 5 minutes for the next refresh tick. |
Update inference service parameters
Some parameters can be updated after registration; others require deregistering and re-registering the service.
What can and cannot be changed
| Field | Editable after registration? |
|---|---|
| Functions | Yes |
| API Key | Yes — HM rotates the underlying Kubernetes secret automatically |
| External Service Name | No — delete and re-register |
| Model name | No — delete and re-register |
| Model Base URL | No — delete and re-register |
| API Protocol Version | No — delete and re-register |
| Allow Insecure Connection | No — delete and re-register |
Note
API Protocol Version is locked after registration. Each protocol probes a different endpoint path with a different authentication header, and the connectivity probe that ran at registration only passed because the original protocol matched the Model Base URL and Model name. Since those two fields can't be changed, any attempt to switch to a different protocol fails the connectivity probe and the update is rejected. To use a different protocol, deregister the service and register a new one.
How to edit a service parameter
Open the service detail page and select Quick Actions → Edit Service, or click the pencil icon on the Inference Services list.
Note
HM runs a connectivity probe before applying the update. If the endpoint is unreachable or the new API key is rejected, the update fails and no changes are applied.
De-register an external inference service
Warning
Deregister is permanent. All associated Kubernetes resources (namespace, secret, ServingRuntime, InferenceService) are removed immediately. This action can't be undone.
How to deregister
To delete a service, either open the service detail page and select Quick Actions → Deregister External Inference Service, or click the trash icon on the Inference Services list.
HM blocks deletion if the service is currently referenced by one or more pipelines. Remove or update those pipelines first, then retry.
When deletion succeeds, HM:
- Removes all Kubernetes resources backing the service (including the API key secret).
- Removes the service record from the database.
- Clears all tags associated with the service.