kaka11chen opened a new pull request, #19039:
URL: https://github.com/apache/doris/pull/19039

   # Proposed changes
   
   ## Problem summary
   
   Close #19038
   
   We found `qt_q11` in regression test `test_external_catalog_hive` is very 
slow. 
   The result is only one record, so other data should be filtered out in the 
parquet lazy read situation.
   Then we found currently the parquet reader read many records because we can 
only skip parquet page. But in order to skip parquet page, currently we need to 
read page header, then it will caused prefetch data. Therefore, prefetch data 
in this case may be not good.
   
   So there are two issues:
   
   1. Skip whole row group in this case.
   2. Prefetching data in this case may be not good, need to improve it.
   
   This PR resolve issues 1.
    
   ### Test result:
   Before opt:
   ```
   mysql> select l_quantity from 
test_external_catalog_hive.tpch_1000_parquet.lineitem where l_orderkey = 
599614241 and l_partkey = 59018738 and l_suppkey = 1518744 limit 2;
   +------------+
   | l_quantity |
   +------------+
   |      16.00 |
   +------------+
   ```
   1 row in set (2 min 27.55 sec)
   
   After opt:
   ```
   mysql> select l_quantity from 
test_external_catalog_hive.tpch_1000_parquet.lineitem where l_orderkey = 
599614241 and l_partkey = 59018738 and l_suppkey = 1518744 limit 2;
   +------------+
   | l_quantity |
   +------------+
   |      16.00 |
   +------------+
   1 row in set (41.95 sec)
   ```
   
   ## Checklist(Required)
   
   * [ ] Does it affect the original behavior
   * [ ] Has unit tests been added
   * [ ] Has document been added or modified
   * [ ] Does it need to update dependencies
   * [ ] Is this PR support rollback (If NO, please explain WHY)
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at 
[d...@doris.apache.org](mailto:d...@doris.apache.org) by explaining why you 
chose the solution you did and what alternatives you considered, etc...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to