klsince opened a new pull request, #13016:
URL: https://github.com/apache/pinot/pull/13016

   This PR tries to enhance the RebalanceChecker a bit to skip failed rebalance 
jobs that's very old.
   
   There could be edge cases that rebalance job failed and left a failure job 
status in ZK, but the table got rebalanced with server restarts or other 
cluster operations, leaving this failure job status in ZK for a long time until 
it's cleaned up (by a cleanup mechanism that's size based not time based). So 
when table got imbalanced like during planned maintenance, the checker might 
kick off rebalance unexpectedly. 
   
   So adding a new config `skipRetryTimeoutInMs` for rebalance job to skip 
retrying old jobs. The config is 86400000 (i.e. 1day) by default.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to