zenoyang opened a new pull request, #19880: URL: https://github.com/apache/doris/pull/19880
# Proposed changes Issue Number: close #xxx Support CRoaring COW feature to reduce bitmap copying ## Problem summary In some current scenarios, bitmap column copying occurs, for example, multiple derived columns are derived from the same bitmap column, or the build/probe column is copied to the join_block during join. Generally, the copy performance of bitmap is relatively slow, and the CRoaring CopyOnWrite feature can reduce copying, thereby improving query performance. Note: CRoaring-0.4.0 COW is thread-unsafe, so it needs to be patched to make it thread-safe. The following is the simplified sql of our production scenario: ```sql SELECT `t2`.`page` , `t2`.`entrance` , `t1`.`partition_date` , `t1`.`entrance_code` , BITMAP_UNION_COUNT(CASE WHEN t1.type = 3 THEN t1.device_id END) AS `dau` FROM `doris_vec_stage`.`tbl1` `t1` LEFT JOIN `doris_vec_stage`.`tbl2` `t2` ON `t1`.`first_entrance` = `t2`.`second_join_key` AND `t1`.`page_id` = `t2`.`first_join_key` WHERE `t2`.`entrance` IN ('******', '******') AND `t2`.`page` = '******' AND `t1`.`partition_date` BETWEEN '2023-02-27' AND '2023-03-19' GROUP BY 1, 2, 3, 4 ``` (The following tests are based on doris 1.1.5) Before, the query took 112756 ms: ``` ProbePhase: - ProbeExprCallTime: 1.836ms - ProbeFindNextTime: 5s934ms - ProbeRows: 90.048K (90048) - ProbeTime: 1m52s - ProbeWhenBuildSideOutputTime: 62.457ms - ProbeWhenProbeSideOutputTime: 54s431ms - ProbeWhenSearchHashTableTime: 27.254ms ``` Column copy time: `ProbeWhenProbeSideOutputTime: 54s431ms` Column clear time is not printed here, the actual value is probably: (112000-5934-54431) ms = 51653 ms After (open croaing cow), the query takes 53585 ms: ``` ProbePhase: - ProbeExprCallTime: 1.331ms - ProbeFindNextTime: 40s635ms - ProbeRows: 37.348K (37348) - ProbeTime: 53s96ms - ProbeWhenBuildSideOutputTime: 17.399ms - ProbeWhenProbeSideOutputTime: 6s247ms - ProbeWhenSearchHashTableTime: 6.700ms ``` Column copy time: `ProbeWhenProbeSideOutputTime: 6s247ms` Column clear time: (53096-40635-6247) ms = 6214 ms Describe your changes. ## Checklist(Required) * [ ] Does it affect the original behavior * [ ] Has unit tests been added * [ ] Has document been added or modified * [x] Does it need to update dependencies * [x] Is this PR support rollback (If NO, please explain WHY) ## Further comments If this is a relatively large or complex change, kick off the discussion at [d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org