Re: [PR] feat: add fallback mechanism for specific error codes in ai-proxy-multi [apisix]

via GitHub Tue, 02 Sep 2025 01:53:04 -0700


kayx23 commented on code in PR #12571:
URL: https://github.com/apache/apisix/pull/12571#discussion_r2315386840



##########
docs/en/latest/plugins/ai-proxy-multi.md:
##########
@@ -51,7 +51,7 @@ In addition, the Plugin also supports logging LLM request 
information in the acc
 
 | Name                               | Type            | Required | Default    
                       | Valid Values | Description |
 
|------------------------------------|----------------|----------|-----------------------------------|--------------|-------------|
-| fallback_strategy                  | string         | False    | 
instance_health_and_rate_limiting | instance_health_and_rate_limiting | 
Fallback strategy. When set, the Plugin will check whether the specified 
instance’s token has been exhausted when a request is forwarded. If so, forward 
the request to the next instance regardless of the instance priority. When not 
set, the Plugin will not forward the request to low priority instances when 
token of the high priority instance is exhausted. |
+| fallback_strategy                  | string or array         | False    |  | 
"instance_health_and_rate_limiting" "http_429", "http_5xx" or ["rate_limiting", 
"http_429", "http_5xx"] | Fallback strategy. When set, the Plugin will check 
whether the specified instance’s token has been exhausted when a request is 
forwarded. If so, forward the request to the next instance regardless of the 
instance priority. When not set, the Plugin will not forward the request to low 
priority instances when token of the high priority instance is exhausted. |

Review Comment:
   ```suggestion
   | fallback_strategy                  | string or array         | False    |  
| string: "instance_health_and_rate_limiting", "http_429", "http_5xx"<br>array: 
["rate_limiting", "http_429", "http_5xx"] | Fallback strategy. The option 
`instance_health_and_rate_limiting` is kept for backward compatibility and is 
functionally the same as `rate_limiting`. With `rate_limiting` or 
`instance_health_and_rate_limiting`, when the current instance’s quota is 
exhausted, the request is forwarded to the next instance regardless of 
priority. With `http_429`, if an instance returns status code 429, the request 
is retried with other instances. With `http_5xx`, if an instance returns a 5xx 
status code, the request is retried with other instances. If all instances 
fail, the plugin returns the last error response code. When not set, the Plugin 
will not forward the request to low priority instances when token of the high 
priority instance is exhausted. |
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat: add fallback mechanism for specific error codes in ai-proxy-multi [apisix]

Reply via email to