DarvenDuan opened a new pull request, #36322: URL: https://github.com/apache/doris/pull/36322
## Proposed changes Issue Number: https://github.com/apache/doris/issues/22922 <!--Describe your changes.--> Previous implementation issue: In certain cases, when the number of elements in the IN clause exceeds doris_max_scan_key_num and the column used in the IN clause is covered by the prefix index, Doris only generates one scan key range. However, it is not reasonable because not effectively use prefix indexes for data filtering. Example: ``` CREATE TABLE `test_tbl` ( `k` INT NULL, `V` BIGINT SUM NULL ) ENGINE=OLAP AGGREGATE KEY(`k`) COMMENT 'OLAP' DISTRIBUTED BY HASH(`k`) BUCKETS 1 PROPERTIES ( "replication_num"="1" ); insert into test_tbl values (0,0),(1,1),(2,2),(3,3),(4,4),(5,5); select * from test_tbl where k in (1, 3, 14, 58, 60); ``` doris_max_scan_key_num = 3 we get single scan_key_range: [1, 60], and RowsKeyRangeFiltered only 1 row profile: ``` OLAP_SCAN_OPERATOR (id=0. nereids_id=80. table name = test_tbl(test_tbl)): - RuntimeFilters: : - PushDownPredicates: [{k IN [1, 3, 14, 58, 60]}] - KeyRanges: ScanKeys:ScanKey=[1 : 60] ... - KeyRangesNum: 1 ... VScanner: ... SegmentIterator: ... - RawRowsRead: 5 ... - RowsKeyRangeFiltered: 1 - RowsShortCircuitPredFiltered: 3 - RowsShortCircuitPredInput: 5 ... ``` By optimizing, we can now generate up to doris_max_scan_key_num scan key ranges based on the values in the IN clause. This not only increases the concurrency of the scan but also improves the efficiency of data filtering. we get three scan_key_range: [1, 3], [14, 14], [58, 60], and RowsKeyRangeFiltered is 3 profile: ``` OLAP_SCAN_OPERATOR (id=0. nereids_id=80. table name = test_tbl(test_tbl)): - RuntimeFilters: : - PushDownPredicates: [{k IN [1, 3, 14, 58, 60]}] - KeyRanges: ScanKeys:ScanKey=[1 : 3]ScanKey=[14 : 14]ScanKey=[58 : 60] ... - KeyRangesNum: 3 ... VScanner: ... SegmentIterator: ... - RawRowsRead: 3 ... - RowsKeyRangeFiltered: 3 - RowsShortCircuitPredFiltered: 1 - RowsShortCircuitPredInput: 3 ... ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org