zhbinbin opened a new pull request #4198:
URL: https://github.com/apache/incubator-doris/pull/4198


   The original Doris bitmap aggregation function has poor performance on the 
intersection and union set of bitmap cardinality of more than one billion. 
There are two reasons for this. The first is that when the bitmap cardinality 
is large, if the data size exceeds 1g, the network / disk IO time consumption 
will increase; The second point is that all the sink data of the back-end be 
instance are transferred to the top node for intersection and union 
calculation, which leads to the pressure on the top single node and becomes the 
bottleneck.
   
   My solution is to create a fixed schema table based on the Doris 
fragmentation rule, and hash fragment the ID range based on the bitmap, that 
is, cut the ID range vertically to form a small cube. Such bitmap blocks will 
become smaller and evenly distributed on all back-end be instances. Based on 
the schema table, some new high-performance udaf aggregation functions are 
developed. All Scan nodes participate in intersection and union calculation, 
and top nodes only summarize
   
   The design goal is that the base number of bitmap is more than 10 billion, 
and the response time of cross union set calculation of 100 dimensional 
granularity is within 5 s


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to