xinghuayu007 opened a new issue #5132:
URL: https://github.com/apache/incubator-doris/issues/5132


   **Describe the bug**
   
   When query uses **bucket join**, fuction 
`Coordinator#BucketShuffleJoinController#getExecHostPortForFragmentIDAndBucketSeq`
 is responseful for making sure each host have average bucket to scan. That 
means if there are 10 buckets to scan and 5 hosts, the strategy will 
distributed 2 buckets to each host. The algorithm is like this:
   
   a. use data structure `buckendIdToBucketCountMap` to represents how many 
buckets distributed to the backend;
   b. traverse every backend, find a backend which owns minimum buckets. We 
call it **mini_backend**;
   c. distribute the bucket to the  **mini_backend**; 
   d. update  `buckendIdToBucketCountMap` for **mini_backend**;
   
   When all bakends are all alive, the algorithm is available. But when 
mini_backend is not alive, it will chose a replica host as final host randomly 
and `buckendIdToBucketCountMap` is not updated. This will cause the bucket scan 
task not load balance.
   
   **Desktop (please complete the following information):**
    - OS: [e.g. iOS]
    - Browser [e.g. chrome, safari]
    - Version [e.g. 22]
   
   **Smartphone (please complete the following information):**
    - Device: [e.g. iPhone6]
    - OS: [e.g. iOS8.1]
    - Browser [e.g. stock browser, safari]
    - Version [e.g. 22]
   
   **Additional context**
   Add any other context about the problem here.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to