This service is an OpenAI-compatible API router designed to intelligently dispatch requests to different vLLM backends based on request content and model capabilities.
It enables a single entry point (for example OpenWebUI) to transparently consume multiple on-premise LLMs, each optimized for a specific task such as text generation or image understanding, without requiring any client-side changes.
The router solves a common on-premise limitation:
no single model excels at everything.
For example:
This service automatically routes:
The client continues to use the standard OpenAI Chat Completions API, unaware of the internal routing logic.
/v1/chat/completions and /v1/completions endpointsmodel field when neededThe router operates in three modes:
If the request specifies the configured text or image model name, the request is routed accordingly.
model=auto)The router can rewrite the model field before forwarding the request to match the exact model name expected by the vLLM backend.
URL_LLM_TXT
Base URL of the text-only vLLM backend (with or without /v1)
APIKEY_LLM_TXT (optional)
API key injected as Authorization: Bearer … when forwarding requests
MODEL_NAME_LLM_TXT
Model name exposed to clients (e.g. oss-20b)
UPSTREAM_MODEL_NAME_LLM_TXT (optional)
Actual model name expected by the vLLM backend
URL_LLM_IMG
Base URL of the vision-capable vLLM backend
APIKEY_LLM_IMG (optional)
API key injected when forwarding requests
MODEL_NAME_LLM_IMG
Model name exposed to clients (e.g. gemma-vision)
UPSTREAM_MODEL_NAME_LLM_IMG (optional)
Actual upstream model name used by the vision backend
EXPOSE_AUTO_MODEL (default: true)
Exposes the virtual auto model in /v1/models
MAX_IMAGE_URL_CHARS (default: 20000000)
Safety limit to prevent oversized data URLs
This router is designed to work seamlessly with:
No client-side modification is required.
Detailed architecture explanations and real-world use cases are available :
This service acts as a capability-aware LLM gateway, enabling:
It allows organizations to combine the strengths of multiple models while preserving a single, stable OpenAI-compatible API surface.