mcgilman opened a new pull request, #11038:
URL: https://github.com/apache/nifi/pull/11038

   …hanged port.
   
   **NIFI-15735 Fix stale load balance clients when node reconnects with 
changed port**
   
   ## Problem
   
   When a NiFi cluster node restarts with a different 
`nifi.cluster.load.balance.port` (e.g., an administrator changes the port, or 
the system test `testRoundRobinWithRestartAndPortChange` exercises this 
scenario), the other nodes in the cluster continue attempting to send 
load-balanced FlowFiles to the **old** port. This results in repeated 
`Connection refused` errors, FlowFiles never reaching the restarted node, and 
an uneven data distribution that never resolves — ultimately causing test 
timeouts after 5 minutes.
   
   The root cause is that `NodeIdentifier.equals()` only compares the node's 
UUID, not its load balance address or port. Two components relied on this 
equality check and silently ignored the port change:
   
   1. **`NioAsyncLoadBalanceClientRegistry.register()`** — When called with a 
`NodeIdentifier` whose UUID already had registered clients, it unconditionally 
reused those clients. The existing clients still held the old load balance port 
in their `NodeIdentifier`, so every connection attempt targeted the wrong port. 
No new clients were ever created.
   
   2. **`SocketLoadBalancedFlowFileQueue.onNodeAdded()`** — When a node was 
re-added to the cluster, the guard `nodeIdentifiers.contains(nodeId)` returned 
`true` (because `contains` delegates to `equals`, which only checks UUID) and 
immediately returned without calling `setNodeIdentifiers()`. This prevented the 
queue from learning the node's updated load balance port, so even if the 
registry were fixed in isolation, the queue's internal `NodeIdentifier` set 
would remain stale.
   
   Together, these two issues meant the cluster could never recover load 
balancing to a node that changed its load balance port — the system was 
permanently stuck trying to connect to the old port.
   
   ## Fix
   
   - **`NioAsyncLoadBalanceClientRegistry.register()`**: Before reusing 
existing clients, compare the load balance address and port of the existing 
clients against the incoming `NodeIdentifier`. If either has changed, stop and 
remove the old clients, forcing `registerClients()` to create new ones with the 
correct address/port. Also added an empty-set guard to prevent 
`NoSuchElementException` if the client set is unexpectedly empty.
   
   - **`SocketLoadBalancedFlowFileQueue.onNodeAdded()`**: Refined the 
early-return logic. Instead of returning when the UUID is already known, 
compare the existing `NodeIdentifier`'s load balance address and port against 
the new one. If either has changed, proceed to call `setNodeIdentifiers()` 
which rebuilds the queue partitions with the updated node information.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to