morningman commented on PR #45966: URL: https://github.com/apache/doris/pull/45966#issuecomment-2564950419
This pull request introduces a new `OrcMergeRangeFileReader` class and enhances the ORC file reading process with improved profiling and optimized I/O operations. The most important changes include adding new classes and methods, updating existing methods for better performance, and incorporating new profiling capabilities. Enhancements to ORC file reading: * [`be/src/vec/exec/format/orc/orc_file_reader.cpp`](diffhunk://#diff-dc26035940ebe6cd5e2f9b0329de073b4b792f0fc6f89bc95b5445884762067cR1-R106): Added the `OrcMergeRangeFileReader` class to handle merged I/O operations and updated the `read_at_impl` and `_collect_profile_before_close` methods for better performance and profiling. * [`be/src/vec/exec/format/orc/orc_file_reader.h`](diffhunk://#diff-d33a8f4d03d6e43c7c417a13eab68e54592e99a40921b8e5082c5fc9dc20783eR1-R88): Defined the `OrcMergeRangeFileReader` class with necessary attributes and methods for handling merged I/O operations. Updates to ORC reader implementation: * [`be/src/vec/exec/format/orc/vorc_reader.cpp`](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R75): Included the new `OrcMergeRangeFileReader` class and updated methods such as `StripeStreamInputStream::read`, `OrcReader::_create_file_reader`, and `OrcReader::set_fill_columns` to utilize the new class and improve I/O operations. [[1]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R75) [[2]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R144-R171) [[3]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425L255-R285) [[4]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425R336-R342) [[5]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425L1058-L1069) [[6]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425L1082-R1107) [[7]](diffhunk://#diff-97945196187497c82dd 245460b397955be0ebb9caeb75267a72c2bff2d545425L1092-R1118) [[8]](diffhunk://#diff-97945196187497c82dd245460b397955be0ebb9caeb75267a72c2bff2d545425L2662-R2792) Profiling improvements: * [`be/src/vec/exec/format/orc/vorc_reader.h`](diffhunk://#diff-4aba8769f72c9eca9221258f4ea72e6582ca498be3c19347d56fbfc36d8217f2R653-R719): Added profiling capabilities to the `StripeStreamInputStream` and `ORCFileInputStream` classes to collect performance metrics during I/O operations. [[1]](diffhunk://#diff-4aba8769f72c9eca9221258f4ea72e6582ca498be3c19347d56fbfc36d8217f2R653-R719) [[2]](diffhunk://#diff-4aba8769f72c9eca9221258f4ea72e6582ca498be3c19347d56fbfc36d8217f2R728-R733) These changes aim to optimize the ORC file reading process by merging small I/O operations, improving profiling, and handling large I/O operations more efficiently. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org