qzsee opened a new pull request, #28302: URL: https://github.com/apache/doris/pull/28302
## Proposed changes Issue Number: close #xxx <!--Describe your changes.--> The current colocate group has the following problems: 1. If there is an unrecoverable be or decommission some be, or a tablet is faulty and the tablet is slowly repaired, if the group is very large, the whole group is in a unstable state for a long time, and the colocate join is unavailable for a long time. One obvious problem here is that the control granularity of colocate balance is too coarse, and it is not reasonable to mark the group unstable once a tablet in the whole group is unavailable. 2. colocate balance generates a large number of replica repair tasks, which affect other normal repair tasks Based on the above problems, the following optimization is done: 1. If any be is unavailable, immediately replace all the be nodes in the unavailable buckets. If we decommission some be nodes, then we replace them one by one. Then, when we query, we take the intersection of query locations, and try not to degrade the performance of join. 2. Perform traffic limiting for the colocate tablet repair ## Further comments If this is a relatively large or complex change, kick off the discussion at [d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org