somandal opened a new pull request, #15468:
URL: https://github.com/apache/pinot/pull/15468

   When running table rebalance, we need to specify the `minAvailableReplicas` 
to keep up as part of the rebalance. Internally we track the number of replicas 
based on the target assignment only which can be problematic for certain 
scenarios.
   
   For strict replica group based assignment, I found that the rebalance can 
get into an infinite loop for the following conditions:
   - Starting replication = 1
   - Target replication = 3
   - minAvailableReplicas = -1 (or 2) [basically higher than the starting 
replication]
   
   In the above scenario, when doing the assignment for the segments, we hit 
the following condition:
   ```
             if (availableInstances.size() >= minAvailableReplicas) {
               ...
             } else {
               // New assignment cannot be added, use the current instance 
state map
               nextAssignment.put(segmentName, currentInstanceStateMap);
               return currentAvailableInstances;
             }
   ```
   Since `availableInstances.size()` is always 1 (since we start with just 1 
replica), and `minAvailableReplicas` is 2 based on the configuration passed in 
to rebalance, we never make progress and keep setting the current assignment 
for such segments.
   
   What I noticed is that for a few of the partitions it does move forward 
initially, because of this block at the start:
   ```
           if (currentAvailableInstances == null) {
             // First segment assigned to these instances, use the new 
assignment and update the available instances
             nextAssignment.put(segmentName, assignment._instanceStateMap);
             updateNumSegmentsToOffloadMap(numSegmentsToOffloadMap, 
currentInstanceStateMap.keySet(), k);
             return availableInstances;
           } else {
   ```
   
   This PR fixes the above by mandating that `minAvailableReplicas` is never 
higher than the starting replication factor, and if detected that it is higher, 
it is set to the starting replication factor. It also logs a warning when this 
scenario is detected. 
   
   Testing: Added a test case to reproduce this scenario and verified that with 
the fix the test passes
   
   cc @Jackie-Jiang @xiangfu0 @klsince 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to