Hi Happen Good proposal.
If we want to carry out this work, we need to pay attention to several aspects. 1. The data partition and data distribution of Doris are two independent modules. If you want to transfer data according to the data distribution, the metadata information may be high, including all partition information and bucket information. 2. Currently, Doris supports partition first and then distributed. Usually, the two are based on different columns. This will cause the same distribution columns to be on the same machine. For example, if table a is partitioned according to the time column, and then the bucket is divided according to the userid row, then the same userid in different partitions will appear on different machines. In extreme cases, the same userid may appear on all machines, which may degenerate into broadcast join when joining the userid column. I think we should be able to support bucket join for tables with group info. If we support all tables, there are still many points to consider. Thanks, Zhao Chun Lee Happen <happen...@hotmail.com> 于2020年8月19日周三 下午1:08写道: > > Motivation > > At present, Doris support 3 type join: shuffle join, broadcast join, > colocate join. > Except colocate join,another join will lead to a lot of network > consumption. > > For example, there a SQL A join B, the cost of network. > > * broadcast join: if table A is divided into three parts,the net work > cost is 3B > * shuffle join: the network cost is A + B. > > These network consumption not only leads to slow query, but also leads to > extra memory consumption during join. > > Each Doris table have disrtribute info, if the join expr hit the > distribute info, we should use the distribute info to reduce the network > consumption. > > What is bucket shuffle join > > [image.png]< > https://camo.githubusercontent.com/ba3ac7db1e7c983ec0f555d332b1064c69a9dc2a/68747470733a2f2f75706c6f61642d696d616765732e6a69616e7368752e696f2f75706c6f61645f696d616765732f383535323230312d633338336665383461656565313362632e706e673f696d6167654d6f6772322f6175746f2d6f7269656e742f7374726970253743696d61676556696577322f322f772f31323430 > > > > just like Hive's bucket map join, the picture show how it work. if there a > SQL A join B, and the join expr hit the distribute info of A. Bucket > shuffle join only need distribute table B, sent the data to proper table A > part. So the network cost is always B. > > So compared with the original join, obviously bucket shuffle join lead to > less network cost: > > B < min(3B, A + B) > > > It can bring us the following benefits: > > 1. First, Bucket Shuffle Join reduce the network cost and lead to a > better performance for some join. Especially when the bucket is cropped. > > 2. It does not strongly rely on the mechanism of collocate, so it is > transparent to users. There is no mandatory requirement for data > distribution, which will not lead to data skew. > > 3. It can provide more query optimization space for join reorder. > > POC of Bucket Shuffle Join > > Now I've implemented a simple Bucket Shuffle join in Doris and test the > performance of it. > > Now, we chose tpcds query 57. The query have 6 join operation, and 4 of > them can hit Bucket shuffle join. > > Origin Doris Bucket shuffle join > Time Cost 27.7s 16.4s > > It seems to work as well as we expected. I'll do more experiments to > verify its performance in the future > > Implementation > > 1. First, we should add a partition type in thrift type > > 2. FE able to plan and sense queries that can be used bucket shuffle > join. send data distribution info to BE > > 3. BE use the proper hash function to send proper data to proper > instance of BE. > > > Best Wish > Happen Lee > > > >