zenoyang opened a new pull request, #19880:
URL: https://github.com/apache/doris/pull/19880

   # Proposed changes
   
   Issue Number: close #xxx
   Support CRoaring COW feature to reduce bitmap copying
   
   ## Problem summary
   In some current scenarios, bitmap column copying occurs, for example, 
multiple derived columns are derived from the same bitmap column, or the 
build/probe column is copied to the join_block during join. Generally, the copy 
performance of bitmap is relatively slow, and the CRoaring CopyOnWrite feature 
can reduce copying, thereby improving query performance.
   Note: CRoaring-0.4.0 COW is thread-unsafe, so it needs to be patched to make 
it thread-safe.
   
   The following is the simplified sql of our production scenario:
   ```sql
   SELECT `t2`.`page`
     , `t2`.`entrance`
     , `t1`.`partition_date`
     , `t1`.`entrance_code`
     , BITMAP_UNION_COUNT(CASE WHEN t1.type = 3 THEN t1.device_id END) AS `dau`
   FROM `doris_vec_stage`.`tbl1` `t1`
     LEFT JOIN `doris_vec_stage`.`tbl2` `t2`
     ON `t1`.`first_entrance` = `t2`.`second_join_key`
       AND `t1`.`page_id` = `t2`.`first_join_key`
   WHERE `t2`.`entrance` IN ('******', '******')
     AND `t2`.`page` = '******'
     AND `t1`.`partition_date` BETWEEN '2023-02-27' AND '2023-03-19'
   GROUP BY 1, 2, 3, 4
   ```
   
   (The following tests are based on doris 1.1.5)
   Before, the query took 112756 ms:
   ```
   ProbePhase:
         -  ProbeExprCallTime:  1.836ms
         -  ProbeFindNextTime:  5s934ms
         -  ProbeRows:  90.048K  (90048)
         -  ProbeTime:  1m52s
         -  ProbeWhenBuildSideOutputTime:  62.457ms
         -  ProbeWhenProbeSideOutputTime:  54s431ms
         -  ProbeWhenSearchHashTableTime:  27.254ms
   ```
   Column copy time: `ProbeWhenProbeSideOutputTime: 54s431ms`
   Column clear time is not printed here, the actual value is probably: 
(112000-5934-54431) ms = 51653 ms
   
   After (open croaing cow), the query takes 53585 ms:
   ```
   ProbePhase:
         -  ProbeExprCallTime:  1.331ms
         -  ProbeFindNextTime:  40s635ms
         -  ProbeRows:  37.348K  (37348)
         -  ProbeTime:  53s96ms
         -  ProbeWhenBuildSideOutputTime:  17.399ms
         -  ProbeWhenProbeSideOutputTime:  6s247ms
         -  ProbeWhenSearchHashTableTime:  6.700ms
   ```
   Column copy time: `ProbeWhenProbeSideOutputTime:  6s247ms`
   Column clear time: (53096-40635-6247) ms = 6214 ms
   
   Describe your changes.
   
   ## Checklist(Required)
   
   * [ ] Does it affect the original behavior
   * [ ] Has unit tests been added
   * [ ] Has document been added or modified
   * [x] Does it need to update dependencies
   * [x] Is this PR support rollback (If NO, please explain WHY)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at 
[d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you 
chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to