klsince opened a new pull request, #13016: URL: https://github.com/apache/pinot/pull/13016
This PR tries to enhance the RebalanceChecker a bit to skip failed rebalance jobs that's very old. There could be edge cases that rebalance job failed and left a failure job status in ZK, but the table got rebalanced with server restarts or other cluster operations, leaving this failure job status in ZK for a long time until it's cleaned up (by a cleanup mechanism that's size based not time based). So when table got imbalanced like during planned maintenance, the checker might kick off rebalance unexpectedly. So adding a new config `skipRetryTimeoutInMs` for rebalance job to skip retrying old jobs. The config is 86400000 (i.e. 1day) by default. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org