Re: [I] Disallow multiple TableRebalance jobs from running for the same table at the same time [pinot]

via GitHub Wed, 04 Jun 2025 00:49:30 -0700


yashmayya commented on issue #15683:
URL: https://github.com/apache/pinot/issues/15683#issuecomment-2938983837


   > - Create a TableRebalance manager class to oversee the creation and 
management of rebalance jobs on tables
   > - Have all calls to create rebalance jobs go through the above manager 
class, include periodic tasks like SegmentRelocator
   > - Track ongoing rebalances and reject rebalance jobs for tables already 
ongoing rebalance
   > - We could potentially also have a thread pool mechanism to limit the 
number of jobs spawned at a time
   > - Enforce that all jobs enable progress stats tracking so that their 
status can be stored in ZK
   
   
   https://github.com/apache/pinot/pull/15990 addresses these issues.
   
   <hr>
   
   > We will need to ensure that we can handle scenarios where a controller 
dies that had ongoing rebalance jobs. These will have a status in ZK, but when 
a new controller (or the old controller on start-up) identifies this scenario, 
it should gracefully handle it (i.e. in this scenario it should start a new job 
even though in ZK there exists a job for the table with IN_PROGRESS status) -> 
how to detect failed controller scenario and start job vs. avoid starting job 
since one is already running and controller is up and healthy?
   
   This is already handled by the periodic 
[RebalanceChecker](https://github.com/apache/pinot/blob/608f89134e9715fa508f2f800c1920d774fe6e52/pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/RebalanceChecker.java#L54)
 controller job. If a controller dies, the leadership for the tables it was 
previously a leader for will move to a new controller. This new controller will 
run the `RebalanceChecker` job periodically (every 5 minutes by default) and 
will try to detect such failed or stuck rebalances for all the tables it is a 
leader for. If there is a rebalance job whose ZK metadata indicates that it 
hasn't been updated for more than `heartbeatTimeoutInMs` (rebalance config - 
defaults to 1 hour), it will be marked as `ABORTED` and a new rebalance will be 
triggered for the table by the controller (using the same rebalance config).
   
   <hr>
   
   > Today we don't clean up ZK job status ZNodes. Thus they can grow 
indefinitely and we may start hitting the ZNode size limitations. We should 
periodically clean up older job statuses
   
   This isn't done periodically today, but there's a hard limit of 100 jobs of 
each type beyond which older ones will be cleaned up - 
https://github.com/apache/pinot/blob/c15440466a0032c5f74e55940792fb16cd719760/pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/PinotHelixResourceManager.java#L2552-L2558
   
   This isn't ideal, though, and we could maybe make this configurable at least.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Re: [I] Disallow multiple TableRebalance jobs from running for the same table at the same time [pinot]

Reply via email to