J-HowHuang opened a new pull request, #15618: URL: https://github.com/apache/pinot/pull/15618
## Description It is usually difficult to decide the timeout (`externalViewStabilizationTimeoutInMs`). Consequently, some larger tables fail to finish a rebalance job because they take longer than normal and need to be manually re-triggered. ## Change in PR This PR tracks the number of remaining segments to process in the current EV-IS convergence, and checks this number each time the timeout has been reached. If the number is lower than last time it checked, another new session for timeout is granted to carry out the EV-IS convergence, otherwise the timeout exception is thrown as what it does now. For job with `lowDiskMode=true`, the number is the sum of remaining segments to be added and to be deleted. For `lowDiskMode=false` it's the number of remaining segments to be added, as the convergence check only look for these segments. ## Issue This change only applies to rebalance jobs triggered from controller API, other rebalance jobs like the periodic `segmentRelocator` does not have `ZkBasedTableRebalanceObserver` passed to the `TableRebalancer` and thus no progress is tracked so as to enable the dynamic timeout. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org