Re: [PR] Enable ZK-based progress tracking for SegmentRelocator rebalances [pinot]

via GitHub Wed, 11 Jun 2025 09:16:25 -0700


yashmayya commented on code in PR #16008:
URL: https://github.com/apache/pinot/pull/16008#discussion_r2140531538



##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/RebalanceChecker.java:
##########
@@ -111,18 +111,35 @@ private synchronized int retryRebalanceTables(Set<String> 
tableNamesWithType) {
   @VisibleForTesting
   void retryRebalanceTable(String tableNameWithType, Map<String, Map<String, 
String>> allJobMetadata)
       throws Exception {
+    // We first check for rebalance jobs that are stuck - i.e., those that are 
IN_PROGRESS but haven't updated their
+    // status in ZK within the heartbeat timeout. This could occur if a 
controller crashes while running a rebalance job
+    // for instance. These stuck jobs are always aborted here. They will also 
be retried if it's allowed as per the
+    // rebalance context.
+    //
     // Skip retry for the table if rebalance job is still running or has 
completed, in specific:
     // 1) Skip retry if any rebalance job is actively running. Being actively 
running means the job is at IN_PROGRESS
     // status, and has updated its status kept in ZK within the heartbeat 
timeout. It's possible that more than one
     // rebalance jobs are running for the table, but that's fine with 
idempotent rebalance algorithm.
     // 2) Skip retry if the most recently started rebalance job has completed 
with DONE or NO_OP. It's possible that
     // jobs started earlier may be still running, but they are ignored here.
     //
-    // Otherwise, we can get a list of failed rebalance jobs, i.e. those at 
FAILED status; or IN_PROGRESS status but
-    // haven't updated their status kept in ZK within the heartbeat timeout. 
For those candidate jobs to retry:
+    // Note that it should be very unlikely to have scenarios where there are 
more than one rebalance jobs running
+    // for a table at the same time, or to have a rebalance job that started 
earlier but completed later than the one
+    // started most recently since we try to prevent new rebalances from being 
triggered while a rebalance is in
+    // progress by checking ZK metadata. Such scenarios can only occur if 
multiple rebalance jobs are triggered at the
+    // same time and the second one is started before the first one updates 
its status in ZK.
+    //
+    // If we detect that a retry is required based on the above criteria, we 
can get a list of failed rebalance jobs,
+    // i.e. those at FAILED status; or IN_PROGRESS status but haven't updated 
their status kept in ZK within the
+    // heartbeat timeout. For those candidate jobs to retry:
     // 1) Firstly, group them by the original jobIds they retry for so that we 
can skip those exceeded maxRetry.
     // 2) For the remaining jobs, we take the one started most recently and 
retry it with its original configs.
     // 3) If configured, we can abort the other rebalance jobs for the table 
by setting their status to FAILED.
+
+    if (hasStuckInProgressJobs(tableNameWithType, allJobMetadata)) {
+      abortExistingJobs(tableNameWithType, _pinotHelixResourceManager);
+    }

Review Comment:
   > Is it safe to only abort if stuck IN_PROGRESS jobs are there? Earlier we'd 
do this irrespective right?
   
   The `abortExistingJobs` method basically updates all `IN_PROGRESS` rebalance 
jobs to `ABORTED` for the table. Earlier, it was only called when the 
`candidateJobs` map wasn't empty, so it was effectively being called only when 
there are stuck `IN_PROGRESS` jobs (other cases would be no-ops).
   
   > and just wondering if there are any corner cases we'll be missing if we 
move this upfront instead of aborting it only after the delay check?
   
   This shouldn't really affect anything and I'd argue it actually improves the 
situation. The delay should only be for retry attempts, and stuck rebalance 
jobs should be aborted as soon as we detect that the heartbeat timeout has 
elapsed.
   
   > Also, it should be safe to do this after we fetch and check for 
candidateJobs existence? that way we can ensure that the metrics 
TABLE_REBALANCE_FAILURE_DETECTED is updated if there is a need to retry and we 
aborted jobs
   
   This should happen even now, because candidate jobs include `ABORTED` 
rebalances.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Re: [PR] Enable ZK-based progress tracking for SegmentRelocator rebalances [pinot]

Reply via email to