DarvenDuan opened a new pull request, #36322:
URL: https://github.com/apache/doris/pull/36322

   ## Proposed changes
   
   Issue Number: https://github.com/apache/doris/issues/22922
   
   <!--Describe your changes.-->
   
   Previous implementation issue:
   In certain cases, when the number of elements in the IN clause exceeds 
doris_max_scan_key_num and the column used in the IN clause is covered by the 
prefix index, Doris only generates one scan key range. However, it is not 
reasonable because not effectively use prefix indexes for data filtering.
   
   Example:
   ```
   CREATE TABLE `test_tbl` (
     `k` INT NULL,
     `V` BIGINT SUM NULL
   ) ENGINE=OLAP
   AGGREGATE KEY(`k`)
   COMMENT 'OLAP'
   DISTRIBUTED BY HASH(`k`) BUCKETS 1
   PROPERTIES (
   "replication_num"="1"
   );
   
   insert into test_tbl values (0,0),(1,1),(2,2),(3,3),(4,4),(5,5);
   
   select * from test_tbl where k in (1, 3, 14, 58, 60);
   ```
   doris_max_scan_key_num = 3
   we get single scan_key_range: [1, 60], and RowsKeyRangeFiltered only 1 row
   
   profile:
   ```
   OLAP_SCAN_OPERATOR  (id=0.  nereids_id=80.  table  name  =  
test_tbl(test_tbl)):
           -  RuntimeFilters:  :  
           -  PushDownPredicates:  [{k  IN  [1,  3,  14,  58,  60]}]
           -  KeyRanges:  ScanKeys:ScanKey=[1  :  60]
           ...
           -  KeyRangesNum:  1
           ...
       VScanner:
               ...
           SegmentIterator:
                   ...
                   -  RawRowsRead:  5
                   ...
                   -  RowsKeyRangeFiltered:  1
                   -  RowsShortCircuitPredFiltered:  3
                   -  RowsShortCircuitPredInput:  5
                   ...
   ```
   
   By optimizing, we can now generate up to doris_max_scan_key_num scan key 
ranges based on the values in the IN clause. This not only increases the 
concurrency of the scan but also improves the efficiency of data filtering.
   
   we get three scan_key_range: [1, 3], [14, 14], [58, 60], and 
RowsKeyRangeFiltered is 3
   
   profile:
   ```
   OLAP_SCAN_OPERATOR  (id=0.  nereids_id=80.  table  name  =  
test_tbl(test_tbl)):
           -  RuntimeFilters:  :  
           -  PushDownPredicates:  [{k  IN  [1,  3,  14,  58,  60]}]
           -  KeyRanges:  ScanKeys:ScanKey=[1  :  3]ScanKey=[14  :  
14]ScanKey=[58  :  60]
           ...
           -  KeyRangesNum:  3
           ...
       VScanner:
               ...
           SegmentIterator:
                   ...
                   -  RawRowsRead:  3
                   ...
                   -  RowsKeyRangeFiltered:  3
                   -  RowsShortCircuitPredFiltered:  1
                   -  RowsShortCircuitPredInput:  3
                   ...
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to