kjprice opened a new issue, #13083:
URL: https://github.com/apache/apisix/issues/13083
## Description
Currently, `ai-proxy-multi` selects AI instances entirely server-side via
configured `priority`, `weight`, and `fallback_strategy`. The client has no way
to influence which model or provider handles their request, or in what order
fallbacks should be attempted.
This proposal adds support for a `models` field in the request body,
allowing clients to specify their preferred model ordering while the gateway
retains full control over authentication, rate limiting, and provider
configuration.
## Proposed API
Clients include a `models` array in the chat completions request body. Each
element can be:
**Object form** (full control):
```json
{
"messages": [{"role": "user", "content": "Hello"}],
"models": [
{"provider": "anthropic", "model": "claude-sonnet-4-20250514"},
{"provider": "openai", "model": "gpt-4o"}
]
}
```
**String shorthand** (just model names, matched against configured
instances):
```json
{
"messages": [{"role": "user", "content": "Hello"}],
"models": ["claude-sonnet-4-20250514", "gpt-4o"]
}
```
**Mixed**:
```json
{
"messages": [{"role": "user", "content": "Hello"}],
"models": [
{"provider": "anthropic", "model": "claude-sonnet-4-20250514"},
"gpt-4o"
]
}
```
## Behavior
1. The `models` array defines the **client's preferred instance ordering**
(first element = highest priority).
2. Each entry is matched against the route's configured `ai-proxy-multi`
instances by `model` name and optionally `provider`.
3. String entries match by model name only. Object entries can match by
`provider` + `model` for disambiguation (e.g., two providers serving the same
model).
4. Unmatched entries are ignored (the client cannot introduce
providers/models not configured on the route).
5. Configured instances not referenced in `models` are appended after the
client's preferred list, in their original priority order.
6. The existing `fallback_strategy` still applies — if the top-priority
instance fails, the plugin falls back through the client-specified order, then
server-configured instances.
7. Auth, rate limiting, and all other instance config remain
server-controlled.
## Configuration
A new optional boolean on the plugin config to opt-in:
```yaml
plugins:
ai-proxy-multi:
allow_client_model_preference: true # default: false
instances:
- name: anthropic-primary
provider: anthropic
options:
model: claude-sonnet-4-20250514
# ...
- name: openai-fallback
provider: openai
options:
model: gpt-4o
# ...
```
When `allow_client_model_preference` is `false` (default), the `models`
field is stripped from the request body and instance ordering is purely
server-driven. This ensures backward compatibility.
## Use Case
We're building an LLM proxy gateway where multiple teams consume AI services
through a shared gateway. Different clients have different model preferences:
- Team A prefers Claude with GPT-4o fallback
- Team B prefers GPT-4o with Claude fallback
- The gateway team manages auth keys, rate limits, and provider configs
centrally
Today this requires separate routes per team/preference. With client-driven
model selection, a single route handles all teams while respecting their
preferences.
## Implementation Notes
- The matching logic would live in `ai-proxy-multi.lua`, executed before
instance selection in the `access` phase.
- The `models` field should be stripped from the request body before
forwarding to the upstream provider (providers don't recognize it).
- String matching should be case-sensitive to match provider model naming
conventions.
- When using `chash` balancer, client preference should take precedence over
hash-based selection.
I'm happy to submit a PR for this feature if the approach looks good.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]