Copilot commented on code in PR #13084:
URL: https://github.com/apache/apisix/pull/13084#discussion_r2915641166


##########
apisix/plugins/ai-proxy-multi.lua:
##########
@@ -409,6 +480,33 @@ end
 
 
 function _M.access(conf, ctx)
+    if conf.allow_client_model_preference then
+        local body, err = core.request.get_body()
+        if body then
+            local request_body, decode_err = core.json.decode(body)
+            if request_body and not decode_err and request_body.models then
+                ctx.client_model_preference = match_client_models(
+                    conf.instances, request_body.models)
+                core.log.info("client model preference: ",
+                              
core.json.delay_encode(ctx.client_model_preference))
+                request_body.models = nil
+                ngx.req.set_body_data(core.json.encode(request_body))
+            end
+        end
+    end

Review Comment:
   The `models` field is only stripped when `allow_client_model_preference` is 
enabled. This contradicts the PR description/docs (and the new tests) which 
state that `models` is always stripped before forwarding upstream. As-is, 
requests sent with `models` while the feature is disabled will forward an extra 
`models` field to upstream providers.
   
   Suggestion: always remove `models` from the JSON request body when present, 
but only apply client-driven reordering when `allow_client_model_preference` is 
true (i.e., strip unconditionally; reorder conditionally).



##########
apisix/plugins/ai-proxy-multi.lua:
##########
@@ -299,6 +300,76 @@ local function get_instance_conf(instances, name)
 end
 
 
+local function match_client_models(instances, models)
+    local ordered_names = {}
+    local matched = {}
+
+    for _, model_pref in ipairs(models) do
+        local target_model, target_provider
+        if type(model_pref) == "string" then

Review Comment:
   `match_client_models()` assumes `models` is an array and iterates it with 
`ipairs(models)`. If a client sends a non-array JSON value (e.g. 
`"models":"gpt-4"`), `ipairs` will raise an error and the request will 500.
   
   Suggestion: validate `request_body.models` is a table/array before calling 
`match_client_models` (and/or harden `match_client_models` to return the 
default ordering when `type(models) ~= "table"`). Also consider skipping 
elements that are not a string or object.



##########
apisix/plugins/ai-proxy-multi.lua:
##########
@@ -430,6 +528,25 @@ end
 
 
 local function retry_on_error(ctx, conf, code)
+    if ctx.client_model_preference then
+        ctx.client_model_tried = ctx.client_model_tried or {}
+        ctx.client_model_tried[ctx.picked_ai_instance_name] = true
+        if (code == 429 and fallback_strategy_has(conf.fallback_strategy, 
"http_429")) or
+           (code >= 500 and code < 600 and
+           fallback_strategy_has(conf.fallback_strategy, "http_5xx")) then
+            local name, ai_instance, err = pick_preferred_instance(ctx, conf)
+            if err then
+                core.log.error("all preferred instances failed: ", err)
+                return 502
+            end
+            ctx.balancer_ip = name
+            ctx.picked_ai_instance_name = name
+            ctx.picked_ai_instance = ai_instance
+            return
+        end
+        return code
+    end

Review Comment:
   The new client-preference retry path in `retry_on_error()` (falling back 
through `ctx.client_model_preference` on 429/5xx) isn’t covered by the added 
tests. Current test cases validate selection/reordering and stripping, but not 
that a 429/5xx from the preferred instance causes the plugin to retry the next 
preferred instance (and that it stops retrying on non-matching status codes).
   
   Suggestion: add a test that makes the first preferred instance return 
429/5xx and asserts the response comes from the next preferred instance 
(similar to existing fallback tests in `ai-proxy-multi.balancer.t`).



##########
apisix/plugins/ai-proxy-multi.lua:
##########
@@ -299,6 +300,76 @@ local function get_instance_conf(instances, name)
 end
 
 
+local function match_client_models(instances, models)
+    local ordered_names = {}
+    local matched = {}
+
+    for _, model_pref in ipairs(models) do
+        local target_model, target_provider
+        if type(model_pref) == "string" then
+            target_model = model_pref
+        elseif type(model_pref) == "table" then
+            target_model = model_pref.model
+            target_provider = model_pref.provider
+        end
+
+        for _, instance in ipairs(instances) do
+            if not matched[instance.name] then
+                local inst_model = instance.options and instance.options.model
+                local matches = false
+                if target_provider then
+                    matches = (instance.provider == target_provider
+                               and inst_model == target_model)
+                else
+                    matches = (inst_model == target_model)
+                end
+                if matches then
+                    matched[instance.name] = true
+                    core.table.insert(ordered_names, instance.name)
+                    break
+                end
+            end
+        end
+    end
+
+    for _, instance in ipairs(instances) do
+        if not matched[instance.name] then
+            core.table.insert(ordered_names, instance.name)
+        end
+    end
+

Review Comment:
   The fallback order appended for instances not listed by the client is based 
on the raw `conf.instances` array order. However, the server-driven picker uses 
priority-based ordering (see `priority_balancer` sorting by 
`instance.priority`). If `conf.instances` is not already sorted, this can cause 
lower-priority instances to be tried before higher-priority ones when the 
client omits them.
   
   Suggestion: when appending unmatched instances, order them consistently with 
the server-side priority behavior (e.g. sort unmatched by `instance.priority` 
descending, and keep stable order within the same priority).
   ```suggestion
       local unmatched = {}
       for idx, instance in ipairs(instances) do
           if not matched[instance.name] then
               core.table.insert(unmatched, {
                   name = instance.name,
                   priority = instance.priority or 0,
                   index = idx,
               })
           end
       end
   
       table.sort(unmatched, function(a, b)
           if a.priority == b.priority then
               return a.index < b.index
           end
           return a.priority > b.priority
       end)
   
       for _, item in ipairs(unmatched) do
           core.table.insert(ordered_names, item.name)
       end
   ```



##########
apisix/plugins/ai-proxy-multi.lua:
##########
@@ -409,6 +480,33 @@ end
 
 
 function _M.access(conf, ctx)
+    if conf.allow_client_model_preference then
+        local body, err = core.request.get_body()
+        if body then
+            local request_body, decode_err = core.json.decode(body)
+            if request_body and not decode_err and request_body.models then
+                ctx.client_model_preference = match_client_models(
+                    conf.instances, request_body.models)
+                core.log.info("client model preference: ",
+                              
core.json.delay_encode(ctx.client_model_preference))
+                request_body.models = nil
+                ngx.req.set_body_data(core.json.encode(request_body))
+            end
+        end
+    end
+
+    if ctx.client_model_preference then
+        local name, ai_instance, err = pick_preferred_instance(ctx, conf)
+        if err then
+            return 503, err
+        end
+        ctx.picked_ai_instance_name = name
+        ctx.picked_ai_instance = ai_instance
+        ctx.balancer_ip = name
+        ctx.bypass_nginx_upstream = true
+        return

Review Comment:
   When client preference is used, instance selection bypasses `pick_target()` 
and therefore bypasses the existing active health-check filtering done via 
`fetch_health_instances()`/`healthcheck_manager`. This means a client can force 
the plugin to try an instance that health checks currently mark as down, 
changing behavior compared to the server-driven picker.
   
   Suggestion: apply the same health-check availability filtering to the 
preferred list (e.g. build an allowlist of healthy instance names before 
iterating, or reuse the existing picker/checker status logic) so client-driven 
ordering doesn't ignore configured health checks.
   ```suggestion
           -- apply health-check availability filtering to the preferred list
           local healthy_instances, health_err = fetch_health_instances(conf, 
ctx)
           local use_client_preference = true
   
           if not healthy_instances then
               core.log.warn("failed to fetch healthy instances for client 
model preference: ",
                             health_err, ", falling back to balancer selection")
               use_client_preference = false
           else
               local healthy_set = {}
               for _, inst in ipairs(healthy_instances) do
                   if inst.name then
                       healthy_set[inst.name] = true
                   end
               end
   
               local filtered_preference = {}
               for _, pref in ipairs(ctx.client_model_preference) do
                   if pref.instance_name and healthy_set[pref.instance_name] 
then
                       filtered_preference[#filtered_preference + 1] = pref
                   end
               end
   
               if #filtered_preference == 0 then
                   core.log.warn("no healthy instances match client model 
preference; ",
                                 "falling back to balancer selection")
                   use_client_preference = false
               else
                   ctx.client_model_preference = filtered_preference
               end
           end
   
           if use_client_preference then
               local name, ai_instance, err = pick_preferred_instance(ctx, conf)
               if err then
                   return 503, err
               end
               ctx.picked_ai_instance_name = name
               ctx.picked_ai_instance = ai_instance
               ctx.balancer_ip = name
               ctx.bypass_nginx_upstream = true
               return
           end
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to