Re: [PR] fix(healthcheck): use immediate clear() to ensure k8s endpoint change… [apisix]

via GitHub Wed, 11 Mar 2026 23:57:59 -0700


Baoyuantop commented on PR #13029:
URL: https://github.com/apache/apisix/pull/13029#issuecomment-4044429025


   Thanks for the explanation. I agree that delayed_clear has a problem in this 
scenario -- you're right that stale IPs can persist. But switching to clear() 
introduces a different issue: it wipes the health status of all nodes 
(including the ones that haven't changed), causing a temporary health status 
reset on every upstream update.
   Here's what happens with clear():
   
   1. Upstream has nodes A, B, C, D (all with accumulated health status)
   2. K8s scales down to A, B
   3. clear() removes all target data from shm (A, B, C, D)
   4. New checker re-adds A, B as healthy by default
   5. If A was actually unhealthy, traffic briefly flows to a broken node until 
the next health check cycle detects the failure again
   
   
   The correct approach is a diff-based removal: compare the old target list 
with the new node list, and only remove the nodes that no longer exist (C, D), 
while preserving A and B's health status.
   Something like this in timer_create_checker:
   ```
   local existing_checker = working_pool[resource_path]
   if existing_checker then
       -- diff-based removal: only remove nodes that no longer exist
       local new_nodes_set = {}
       for _, node in ipairs(upstream.nodes) do
           new_nodes_set[node.host .. ":" .. (node.port or "")] = true
       end
       for _, target in ipairs(existing_checker.checker.targets) do
           local key = target.ip .. ":" .. (target.port or "")
           if not new_nodes_set[key] then
               existing_checker.checker:remove_target(target.ip, target.port, 
target.hostname)
           end
       end
       existing_checker.checker:stop()
   end
   ```
   This removes only the stale IPs while preserving health history for 
unchanged nodes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] fix(healthcheck): use immediate clear() to ensure k8s endpoint change… [apisix]

Reply via email to