somandal opened a new pull request, #15681: URL: https://github.com/apache/pinot/pull/15681
Today when `SegmentRelocator` runs, it issues a table rebalance request for each table without checking whether the last rebalance it had issued completed or not. For small rebalances that move a few segments, this is usually okay, since we expect that the previous rebalance triggered by `SegmentRelocator` completes quickly. Sometimes it can happen that a large rebalance is issued, or rebalance takes a long time to complete for other reasons. In such cases, the `SegmentRelocator` should avoid issuing a new a table rebalance request for the given table. We saw an issue where there was a long table rebalance started by `SegmentRelocator` that took multiple hours to finish. In spite of that, every hour a new table rebalance job was created, and that rebalance job would land up running in parallel for the same table. This adds CPU load to the controllers as each table rebalance loops and runs an EV-IS convergence check. **Note:** this PR does not address scenarios where a long running table rebalance is triggered outside of `SegmentRelocator` and `SegmentRelocator` creates a new table rebalance request for that table. This only addresses the rebalances triggered by `SegmentRelocator`. If we want to address this across all rebalances, we need to come up with a design to address this since today we allow multiple rebalances to run in parallel for a given table and we expect idempotent results. One low-hanging fruit might be to have the `SegmentRelocator` check if there are any user issued rebalance jobs by checking ZK to see if any IN_PROGRESS rebalance jobs exist. If this is a good idea, I can open a new PR to address this separately. **Testing:** - Manually tested with a short run frequency of `SegmentRelocator` to ensure that if it triggers a rebalance, and that rebalance takes longer to complete, it does not create a new rebalance job for that table. On the other hand, if the rebalance job completes, the next `SegmentRelocator` run does create a new rebalance job. - Also manually tested single table mode to ensure nothing breaks there -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org