EmmyMiao87 opened a new issue #3552:
URL: https://github.com/apache/incubator-doris/issues/3552


   # Support Bitmap Intersect
   
   Support aggregate function Bitmap Intersect, it is mainly used to take 
intersection of grouped data.
   
   # bitmap_intersect
   
   Calculates the intersection of bitmap columns and returns a bitmap object.
   
   ```
   bitmap_intersect(expr)
   ```
   
   **Parameters**
   
   The `expr` column type must be bitmap.
   
   **Return value**
   
   bitmap object
   
   **Example**
   
   table schema
   
   ```
   create table bitmap_intersect_test (
       tag varchar(20),
       user_id bitmap bitmap_union
   ) 
   AGGREGATE KEY(tag)
   DISTRIBUTED BY HASH(tag) BUCKETS 3;
   ``` 
   
   Query which users satisfy the three tags a, b, and c at the same time.
   
   ```
   select bitmap_to_string(bitmap_intersect(user_id)) from 
   (
       select bitmap_union(user_id) user_id from bitmap_intersect_test 
       where tag in ('a', 'b', 'c')
       group by tag
   ) a
   ```
   
   # Design
   
   ## Semantic analysis
   
   The child type of bitmap_intersect must be bitmap.
   
   ```
   class FunctionCallExpr {
   
       void analyze() {
         if(fnName.equals("bitmap_intersect")) {
             ...
             if(!fn.getChild(0).isBitmapType()) {
                 throw new AnalysisException("the child type of " + fnName + " 
must be bitmap")
             }
             ...
         }
       }
   }
   ```
   
   ## Function implement
   
   The function of each stage of `` `bitmap_intersect``` is declared in` `` 
function set```.
   
   **Function definition**
   
   ```
   FunctionName: bitmap_union,
   InputType: bitmap,
   OutputType: bitmap,
   IntermediateType: varchar
   ```
   
   **init**
   
   Directly reuse the current bitmap init function
   
   ```              
"_ZN5doris15BitmapFunctions11bitmap_initEPN9doris_udf15FunctionContextEPNS1_9StringValE"
   ```
   
   **update**
   **merge**
   
   Perform intersection calculation on the bitmap grouped on the current node
   
   ```
   void BitmapFunctions::bitmap_intersect(FunctionContext* ctx, const 
StringVal& src, StringVal* dst) {
       if (src.is_null) {
           return;
       }
       auto dst_bitmap = reinterpret_cast<BitmapValue*>(dst->ptr);
       // zero size means the src input is a agg object
       if (src.len == 0) {
           (*dst_bitmap) &= *reinterpret_cast<BitmapValue*>(src.ptr);
       } else {
           (*dst_bitmap) &= BitmapValue((char*) src.ptr);
       }
   }
   ```
   
   **serialize**
   **finalize**
   
   Directly replace the current bitmap serialization function
   
   ```             
"_ZN5doris15BitmapFunctions16bitmap_serializeEPN9doris_udf15FunctionContextERKNS1_9StringValE",
   ```
   
   **Query plan**
   
   ```
   
   mysql> explain select bitmap_intersect(user_id) from (select 
bitmap_union(user_id) user_id from  bitmap_intersect_test   where tag in ('a', 
'b', 'c') group by tag ) a;
   
+----------------------------------------------------------------------------------------+
   | Explain String                                                             
            |
   
+----------------------------------------------------------------------------------------+
   | PLAN FRAGMENT 0                                                            
            |
   |  OUTPUT EXPRS:<slot 8>                                                     
            |
   |   PARTITION: UNPARTITIONED                                                 
            |
   |                                                                            
            |
   |   RESULT SINK                                                              
            |
   |                                                                            
            |
   |   6:AGGREGATE (merge finalize)                                             
            |
   |   |  output: bitmap_intersect(<slot 7>)                                    
                |
   |   |  group by:                                                             
            |
   |   |  tuple ids: 5                                                          
            |
   |   |                                                                        
            |
   |   5:EXCHANGE                                                               
            |
   |      tuple ids: 4                                                          
            |
   |                                                                            
            |
   | PLAN FRAGMENT 1                                                            
            |
   |  OUTPUT EXPRS:                                                             
            |
   |   PARTITION: HASH_PARTITIONED: <slot 2>                                    
            |
   |                                                                            
            |
   |   STREAM DATA SINK                                                         
            |
   |     EXCHANGE ID: 05                                                        
            |
   |     UNPARTITIONED                                                          
            |
   |                                                                            
            |
   |   2:AGGREGATE (update serialize)                                           
            |
   |   |  output: bitmap_intersect(<slot 5>)                                    
                |
   |   |  group by:                                                             
            |
   |   |  tuple ids: 4                                                          
            |
   |   |                                                                        
            |
   |   4:AGGREGATE (merge finalize)                                             
            |
   |   |  output: bitmap_union(<slot 3>)                                        
            |
   |   |  group by: <slot 2>                                                    
            |
   |   |  tuple ids: 2                                                          
            |
   |   |                                                                        
            |
   |   3:EXCHANGE                                                               
            |
   |      tuple ids: 1                                                          
            |
   |                                                                            
            |
   | PLAN FRAGMENT 2                                                            
            |
   |  OUTPUT EXPRS:                                                             
            |
   |   PARTITION: RANDOM                                                        
            |
   |                                                                            
            |
   |   STREAM DATA SINK                                                         
            |
   |     EXCHANGE ID: 03                                                        
            |
   |     HASH_PARTITIONED: <slot 2>                                             
            |
   |                                                                            
            |
   |   1:AGGREGATE (update serialize)                                           
            |
   |   |  STREAMING                                                             
            |
   |   |  output: bitmap_union(`user_id`)                                       
            |
   |   |  group by: `tag`                                                       
            |
   |   |  tuple ids: 1                                                          
            |
   |   |                                                                        
            |
   |   0:OlapScanNode                                                           
            |
   |      TABLE:  bitmap_intersect_test                                         
         |
   |      PREAGGREGATION: ON                                                    
            |
   |      PREDICATES: `tag` IN ('a', 'b', 'c') |
   |      partitions=1/1                                                        
            |
   |      rollup: bitmap_intersect_test                                         
         |
   |      tabletRatio=100/100                                                   
            |                                                          |
   |      numNodes=6                                                            
            |
   |      tuple ids: 0                                                          
            |
   
+----------------------------------------------------------------------------------------+
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to