kayx23 commented on code in PR #12571:
URL: https://github.com/apache/apisix/pull/12571#discussion_r2315386840
##########
docs/en/latest/plugins/ai-proxy-multi.md:
##########
@@ -51,7 +51,7 @@ In addition, the Plugin also supports logging LLM request
information in the acc
| Name | Type | Required | Default
| Valid Values | Description |
|------------------------------------|----------------|----------|-----------------------------------|--------------|-------------|
-| fallback_strategy | string | False |
instance_health_and_rate_limiting | instance_health_and_rate_limiting |
Fallback strategy. When set, the Plugin will check whether the specified
instance’s token has been exhausted when a request is forwarded. If so, forward
the request to the next instance regardless of the instance priority. When not
set, the Plugin will not forward the request to low priority instances when
token of the high priority instance is exhausted. |
+| fallback_strategy | string or array | False | |
"instance_health_and_rate_limiting" "http_429", "http_5xx" or ["rate_limiting",
"http_429", "http_5xx"] | Fallback strategy. When set, the Plugin will check
whether the specified instance’s token has been exhausted when a request is
forwarded. If so, forward the request to the next instance regardless of the
instance priority. When not set, the Plugin will not forward the request to low
priority instances when token of the high priority instance is exhausted. |
Review Comment:
```suggestion
| fallback_strategy | string or array | False |
| string: "instance_health_and_rate_limiting", "http_429", "http_5xx"<br>array:
["rate_limiting", "http_429", "http_5xx"] | Fallback strategy. The option
`instance_health_and_rate_limiting` is kept for backward compatibility and is
functionally the same as `rate_limiting`. With `rate_limiting` or
`instance_health_and_rate_limiting`, when the current instance’s quota is
exhausted, the request is forwarded to the next instance regardless of
priority. With `http_429`, if an instance returns status code 429, the request
is retried with other instances. With `http_5xx`, if an instance returns a 5xx
status code, the request is retried with other instances. If all instances
fail, the plugin returns the last error response code. When not set, the Plugin
will not forward the request to low priority instances when token of the high
priority instance is exhausted. |
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]