morningman commented on PR #45966:
URL: https://github.com/apache/doris/pull/45966#issuecomment-2564950419

   This pull request introduces a new `OrcMergeRangeFileReader` class and 
enhances the ORC file reading process with improved profiling and optimized I/O 
operations. The most important changes include adding new classes and methods, 
updating existing methods for better performance, and incorporating new 
profiling capabilities.
   
   Enhancements to ORC file reading:
   
   * 
[`be/src/vec/exec/format/orc/orc_file_reader.cpp`](diffhunk://#diff-dc26035940ebe6cd5e2f9b0329de073b4b792f0fc6f89bc95b5445884762067cR1-R106):
 Added the `OrcMergeRangeFileReader` class to handle merged I/O operations and 
updated the `read_at_impl` and `_collect_profile_before_close` methods for 
better performance and profiling.
   * 
[`be/src/vec/exec/format/orc/orc_file_reader.h`](diffhunk://#diff-d33a8f4d03d6e43c7c417a13eab68e54592e99a40921b8e5082c5fc9dc20783eR1-R88):
 Defined the `OrcMergeRangeFileReader` class with necessary attributes and 
methods for handling merged I/O operations.
   
   Updates to ORC reader implementation:
   
   * 
[`be/src/vec/exec/format/orc/vorc_reader.cpp`](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R75):
 Included the new `OrcMergeRangeFileReader` class and updated methods such as 
`StripeStreamInputStream::read`, `OrcReader::_create_file_reader`, and 
`OrcReader::set_fill_columns` to utilize the new class and improve I/O 
operations. 
[[1]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R75)
 
[[2]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R144-R171)
 
[[3]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425L255-R285)
 
[[4]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R336-R342)
 
[[5]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425L1058-L1069)
 
[[6]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425L1082-R1107)
 [[7]](diffhunk://#diff-97945196187497c82dd
 245460b397955be0ebb9caeb75267a72c2bff2d545425L1092-R1118) 
[[8]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425L2662-R2792)
   
   Profiling improvements:
   
   * 
[`be/src/vec/exec/format/orc/vorc_reader.h`](diffhunk://#diff-4aba8769f72c9eca9221258f4ea72e6582ca498be3c19347d56fbfc36d8217f2R653-R719):
 Added profiling capabilities to the `StripeStreamInputStream` and 
`ORCFileInputStream` classes to collect performance metrics during I/O 
operations. 
[[1]](diffhunk://#diff-4aba8769f72c9eca9221258f4ea72e6582ca498be3c19347d56fbfc36d8217f2R653-R719)
 
[[2]](diffhunk://#diff-4aba8769f72c9eca9221258f4ea72e6582ca498be3c19347d56fbfc36d8217f2R728-R733)
   
   These changes aim to optimize the ORC file reading process by merging small 
I/O operations, improving profiling, and handling large I/O operations more 
efficiently.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to