External inference services Innovation Release

Context

An external inference service connects Hybrid Manager (HM) to a remote model provider hosted outside the cluster — such as OpenAI, Google Gemini, Anthropic, or NVIDIA NIM. You configure the provider's URL, model name, and API key once; HM stores the credentials in Kubernetes secrets and handles authentication transparently for every downstream request.

This page covers:

  • Viewing the Inference Services list
  • Registering a new external inference service
  • Getting the details of a registered service
  • Updating a registered service
  • Deregistering a service

Inference Services list

Open the Inference Services list page from the Estate → Inference Services menu in your HM project.

The list displays each service's name, model, and current status. When the status isn't Ready, a short Status Message explains the cause.

Status

StatusMeaning
ReadyThe service is healthy and accepting requests.
FailedThe most recent health check failed. The service detail page shows a Status Message with the diagnosis (for example, API key is missing or invalid, endpoint unreachable, or account quota exhausted).
PendingThe endpoint hasn't yet been verified by the background health check. Typically seen during the first few minutes after an HM restart, while the cache warms up. The Status Message reads Remote connection check is pending — upstream has not yet been verified.
UnknownThe underlying Kubernetes resources haven't reported a status yet — usually a transient state right after creation.

Status is refreshed every 5 minutes by a background health check. After you create or edit a service, the connectivity probe runs inline so the result is reflected immediately.

Register an external inference service

Navigate to Estate → Quick Actions → Register External Inference Service in your HM project.

Tip

You can also reach the form from the Inference Services list page via the Quick Actions menu.

Prerequisites

Before registering, confirm you have:

  • The provider's base URL (scheme and host, without a trailing /v1).
  • The model name exactly as the provider expects it (case-sensitive).
  • A valid API key for the provider.
  • Network reachability from the HM cluster to the upstream hostname. Your HM administrator may need to allow egress to the provider's domain.

Quick reference by provider

Use this table to fill in the form for the most common providers. The columns map directly to the fields described below.

ProviderModel base URLAPI protocol versionExample model nameNotes
OpenAIhttps://api.openai.comOPENAI_V1gpt-4o-miniAPI key validated at registration.
OpenRouterhttps://openrouter.ai/apiOPENAI_V1openai/gpt-4o-miniAPI key not validated at registration.
Google Geminihttps://generativelanguage.googleapis.comGEMINI_V1_BETAgemini-2.5-proAPI key validated at registration.
Anthropichttps://api.anthropic.comANTHROPIC_V1claude-sonnet-4-5API key validated at registration.
NVIDIA NIMhttps://integrate.api.nvidia.comOPENAI_V1meta/llama-3.1-8b-instructAPI key not validated at registration.
Self-hosted / vLLMYour internal service URL, for example http://vllm-svc.inference:8000OPENAI_V1The model name your vLLM server is servingAPI key not validated at registration when vLLM runs unauthenticated.
Other HM (federation)Another HM's external inference endpoint URLOPENAI_V1Same model name as on the upstream HMEnable Allow Insecure Connection. See Allow Insecure Connection below.

For providers not listed here, see the form field descriptions below.

Form fields

External Service Name (required)

A unique identifier for this service within HM. Must follow DNS-style naming rules:

  • Lowercase letters and digits only.
  • Hyphens (-) are allowed within segments but not at the start or end.
  • Dots (.) are allowed as segment separators.
  • No uppercase letters, underscores, or spaces.
  • Maximum 63 characters.

Example: openai-gpt-4o-mini, azure.gpt-4o.prod.

Tags (optional)

Reuse existing HM tags to group and filter services. Tags have no effect on request routing or authentication.

Model name (required)

The exact identifier the upstream provider expects, as documented by the provider. This value is case-sensitive. Slash-separated names (such as meta/llama-3.1-8b-instruct or openai/gpt-4o-mini) are supported. See the Quick reference by provider table above for examples.

API Key (required for most providers)

The API key only — don't include the Authorization: Bearer … prefix. HM adds the correct auth header automatically based on the API Protocol Version you select.

Model Base URL (required)

The scheme and host (plus any required path prefix) for the provider's API. See the Quick reference by provider table above for the correct value for common providers.

Don't include /v1 — consumer applications append /v1 (or /v1beta for Gemini) themselves. Including /v1 here causes duplicated paths such as /v1/v1/chat/completions, which returns a 404. A trailing slash is tolerated and stripped automatically.

Tip

Self-hosted vLLM servers return 404 on the per-model endpoint /v1/models/{name}, which normally causes the registration probe to fail. HM handles this by falling back to GET /v1/models and matching the model name in the list, so vLLM-served models register successfully.

Functions (optional, multi-select)

Capability tags that consumer applications filter on when discovering available models. Use the predefined values below for HM's built-in consumers; for your own applications, any string is valid.

Built-in consumerRequired function tag
HM chatbotchatbot-gen-content
AIDB pipeline stepThe matching aidb-* tag (see your AIDB pipeline documentation)

Leave this field empty if you're exposing the service exclusively to custom applications that perform their own model selection.

API Protocol Version (required)

Controls both the request body format and the outbound authentication header. Choose the option that matches the provider's native API.

OptionRequest body shapeAuth header sentUse for
OPENAI_V1OpenAI Chat CompletionsAuthorization: Bearer <key>OpenAI, NVIDIA NIM, vLLM, OpenRouter, any OpenAI-compatible endpoint
GEMINI_V1_BETAGoogle Geminix-goog-api-key: <key>Google Gemini native API only
ANTHROPIC_V1Anthropic Messagesx-api-key: <key> + anthropic-version: 2023-06-01Anthropic Claude

Allow Insecure Connection (optional, default off)

Disables TLS certificate verification on outbound calls to the upstream. Enable this only if the upstream uses a self-signed certificate or a certificate signed by a CA not trusted by the HM cluster.

Use this toggle when registering another HM's external inference endpoint (for example, HM2 federating through HM1). HM1's ingress typically presents a self-signed certificate, which causes TLS errors for both the registration probe and runtime forwarding. HM-to-HM federation also requires both ends to use OPENAI_V1 as the API Protocol Version.

Warning

This setting is create-only. You can't toggle it after registration. If you need to change it, delete the service and re-register. Only enable this for development environments or trusted self-signed certificates — disabling TLS verification reduces security.

After clicking Register

HM validates the endpoint before creating any infrastructure. For OpenAI (OPENAI_V1), Anthropic (ANTHROPIC_V1), and Google Gemini (GEMINI_V1_BETA), HM performs a live connectivity probe that checks both reachability and credential validity. If the endpoint is unreachable or the API key is rejected, registration fails immediately with an error — no resources are created.

Note

For some OPENAI_V1 providers — such as NVIDIA NIM, HuggingFace, OpenRouter, and self-hosted vLLM running without authentication — the models endpoint doesn't require an API key. A connectivity probe is still performed, but a wrong API key may still return HTTP 200. Key validity isn't guaranteed at registration time for these providers.

After registration, the status is refreshed every 5 minutes by a background health check. If the upstream becomes unreachable or starts rejecting credentials, the service flips to Failed (with an explanatory Status Message) at the next refresh tick.

Use the service

Once the service is ready, it's available to:

  • HM chatbot — The chatbot picks up services tagged with chatbot-gen-content automatically.
  • Pipeline Designer — Registered external models appear in the model picker alongside HM-hosted models. For details, see External inference services in Pipeline Designer.
  • Gen AI Builder — Models are available as inference targets in Gen AI Builder pipelines once registered.

Retrieve inference service details

Click a service name in the Inference Services list to open its detail view, which shows the service's configuration, current status, and available actions.

Details

FieldDescription
External Service NameThe unique identifier assigned at registration.
Model nameThe model identifier forwarded to the upstream provider.
Model Base URLThe upstream endpoint the proxy routes requests to.
API Protocol VersionThe request format and authentication header in use (OPENAI_V1, GEMINI_V1_BETA, or ANTHROPIC_V1).
FunctionsThe capability tags currently assigned to the service.
Allow Insecure ConnectionWhether TLS certificate verification is disabled for outbound calls.
StatusCurrent health of the service: Ready, Failed, Pending, or Unknown.
Status MessageA human-readable explanation that accompanies the status. Empty when the service is Ready; carries the probe diagnosis (for example, API key is missing or invalid (status 401 from ...) or endpoint unreachable: ...) when Failed; carries the cache warm-up text when Pending.
Note

The API key isn't displayed in the service detail page. To replace it, open Quick Actions → Edit Service and enter a new value. HM rotates the underlying secret automatically.

Troubleshooting from the Status Message

When a service is Failed, the Status Message tells you what to fix. Common patterns:

Status messageLikely causeWhat to do
API key is missing or invalidWrong or expired API key.Open Edit and replace the API Key value.
API key is missing/invalid or account credits are exhausted (Anthropic)Anthropic returns the same status for a bad key and for depleted credits.Verify the key first; if it's correct, top up the Anthropic account.
account quota exhausted — check billing and credits (OpenAI)OpenAI billing balance is empty.Top up the OpenAI account; no change to the HM service is needed.
API key lacks permission for this endpointKey is valid but doesn't have access to the model (project scoping, tier).Grant model access on the provider's console, or use a different key.
endpoint not found — check the base URL and model nameTypo in Model Base URL or wrong model name.Verify both values. Make sure Base URL doesn't end with /v1.
endpoint unreachable: ... x509: ...TLS error against an untrusted certificate.Delete and re-register with Allow Insecure Connection enabled (the toggle is create-only).
endpoint unreachable: ... (other)Network egress blocked, DNS failure, or upstream down.Check egress rules from the HM cluster and confirm the provider is online.
rate limited by upstreamProvider is throttling requests.Wait and retry; recurring rate limits indicate the upstream account tier needs upgrading.
upstream is temporarily unavailable / upstream server errorProvider-side outage.Wait; the next refresh tick will recover automatically once upstream is healthy.
Remote connection check is pending — upstream has not yet been verifiedBackground health check hasn't run yet (typically after an HM restart).Wait up to 5 minutes for the next refresh tick.

Update inference service parameters

Some parameters can be updated after registration; others require deregistering and re-registering the service.

What can and cannot be changed

FieldEditable after registration?
FunctionsYes
API KeyYes — HM rotates the underlying Kubernetes secret automatically
External Service NameNo — delete and re-register
Model nameNo — delete and re-register
Model Base URLNo — delete and re-register
API Protocol VersionNo — delete and re-register
Allow Insecure ConnectionNo — delete and re-register
Note

API Protocol Version is locked after registration. Each protocol probes a different endpoint path with a different authentication header, and the connectivity probe that ran at registration only passed because the original protocol matched the Model Base URL and Model name. Since those two fields can't be changed, any attempt to switch to a different protocol fails the connectivity probe and the update is rejected. To use a different protocol, deregister the service and register a new one.

How to edit a service parameter

Open the service detail page and select Quick Actions → Edit Service, or click the pencil icon on the Inference Services list.

Note

HM runs a connectivity probe before applying the update. If the endpoint is unreachable or the new API key is rejected, the update fails and no changes are applied.

De-register an external inference service

Warning

Deregister is permanent. All associated Kubernetes resources (namespace, secret, ServingRuntime, InferenceService) are removed immediately. This action can't be undone.

How to deregister

To delete a service, either open the service detail page and select Quick Actions → Deregister External Inference Service, or click the trash icon on the Inference Services list.

HM blocks deletion if the service is currently referenced by one or more pipelines. Remove or update those pipelines first, then retry.

When deletion succeeds, HM:

  1. Removes all Kubernetes resources backing the service (including the API key secret).
  2. Removes the service record from the database.
  3. Clears all tags associated with the service.