Re: [PR] Enable ZK-based progress tracking for SegmentRelocator rebalances [pinot]

via GitHub Wed, 11 Jun 2025 11:18:36 -0700


somandal commented on code in PR #16008:
URL: https://github.com/apache/pinot/pull/16008#discussion_r2140607980



##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/RebalanceChecker.java:
##########
@@ -111,18 +111,35 @@ private synchronized int retryRebalanceTables(Set<String> 
tableNamesWithType) {
   @VisibleForTesting
   void retryRebalanceTable(String tableNameWithType, Map<String, Map<String, 
String>> allJobMetadata)
       throws Exception {
+    // We first check for rebalance jobs that are stuck - i.e., those that are 
IN_PROGRESS but haven't updated their
+    // status in ZK within the heartbeat timeout. This could occur if a 
controller crashes while running a rebalance job
+    // for instance. These stuck jobs are always aborted here. They will also 
be retried if it's allowed as per the
+    // rebalance context.
+    //
     // Skip retry for the table if rebalance job is still running or has 
completed, in specific:
     // 1) Skip retry if any rebalance job is actively running. Being actively 
running means the job is at IN_PROGRESS
     // status, and has updated its status kept in ZK within the heartbeat 
timeout. It's possible that more than one
     // rebalance jobs are running for the table, but that's fine with 
idempotent rebalance algorithm.
     // 2) Skip retry if the most recently started rebalance job has completed 
with DONE or NO_OP. It's possible that
     // jobs started earlier may be still running, but they are ignored here.
     //
-    // Otherwise, we can get a list of failed rebalance jobs, i.e. those at 
FAILED status; or IN_PROGRESS status but
-    // haven't updated their status kept in ZK within the heartbeat timeout. 
For those candidate jobs to retry:
+    // Note that it should be very unlikely to have scenarios where there are 
more than one rebalance jobs running
+    // for a table at the same time, or to have a rebalance job that started 
earlier but completed later than the one
+    // started most recently since we try to prevent new rebalances from being 
triggered while a rebalance is in
+    // progress by checking ZK metadata. Such scenarios can only occur if 
multiple rebalance jobs are triggered at the
+    // same time and the second one is started before the first one updates 
its status in ZK.
+    //
+    // If we detect that a retry is required based on the above criteria, we 
can get a list of failed rebalance jobs,
+    // i.e. those at FAILED status; or IN_PROGRESS status but haven't updated 
their status kept in ZK within the
+    // heartbeat timeout. For those candidate jobs to retry:
     // 1) Firstly, group them by the original jobIds they retry for so that we 
can skip those exceeded maxRetry.
     // 2) For the remaining jobs, we take the one started most recently and 
retry it with its original configs.
     // 3) If configured, we can abort the other rebalance jobs for the table 
by setting their status to FAILED.
+
+    if (hasStuckInProgressJobs(tableNameWithType, allJobMetadata)) {
+      abortExistingJobs(tableNameWithType, _pinotHelixResourceManager);
+    }

Review Comment:
   got it, makes sense! let's make sure we test out a bunch of scenarios once 
this code is in (e.g. kill the controller, have some combination of aborted / 
in progress jobs / user triggered etc)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Re: [PR] Enable ZK-based progress tracking for SegmentRelocator rebalances [pinot]

Reply via email to