[PR] skip retry very old rebalance jobs [pinot]

via GitHub Fri, 26 Apr 2024 14:11:22 -0700


klsince opened a new pull request, #13016:
URL: https://github.com/apache/pinot/pull/13016


   This PR tries to enhance the RebalanceChecker a bit to skip failed rebalance 
jobs that's very old.
   
   There could be edge cases that rebalance job failed and left a failure job 
status in ZK, but the table got rebalanced with server restarts or other 
cluster operations, leaving this failure job status in ZK for a long time until 
it's cleaned up (by a cleanup mechanism that's size based not time based). So 
when table got imbalanced like during planned maintenance, the checker might 
kick off rebalance unexpectedly. 
   
   So adding a new config `skipRetryTimeoutInMs` for rebalance job to skip 
retrying old jobs. The config is 86400000 (i.e. 1day) by default.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[PR] skip retry very old rebalance jobs [pinot]

Reply via email to