morningman commented on code in PR #12570: URL: https://github.com/apache/doris/pull/12570#discussion_r969797638
########## be/src/io/hdfs_file_reader.cpp: ########## @@ -147,6 +147,7 @@ Status HdfsFileReader::readat(int64_t position, int64_t nbytes, int64_t* bytes_r BackendOptions::get_localhost(), _namenode, _path, hdfsGetLastError()); } + _current_offset = position; Review Comment: Directly call `HdfsFileReader::seek() here`, to remove duplicate code ########## be/src/vec/exec/format/parquet/vparquet_page_reader.cpp: ########## @@ -38,7 +36,7 @@ Status PageReader::next_page_header() { const uint8_t* page_header_buf = nullptr; size_t max_size = _end_offset - _offset; - size_t header_size = std::min(initPageHeaderSize, max_size); + size_t header_size = 128; Review Comment: Do not use this magic number. why remove `initPageHeaderSize`? ########## be/src/vec/exec/format/parquet/vparquet_group_reader.cpp: ########## @@ -50,7 +50,20 @@ Status RowGroupReader::_init_column_readers(const FieldDescriptor& schema, TypeDescriptor col_type = slot_desc->type(); auto field = const_cast<FieldSchema*>(schema.get_column(slot_desc->col_name())); std::unique_ptr<ParquetColumnReader> reader; - RETURN_IF_ERROR(ParquetColumnReader::create(_file_reader, field, read_col, _row_group_meta, + FileReader* buff_reader; + if (_buffered_file_reader.size() < config::parquet_group_pooled_reader) { Review Comment: I think it is too many readers for a single row group. ########## be/src/vec/exec/format/parquet/vparquet_reader.h: ########## @@ -130,6 +128,7 @@ class ParquetReader { int64_t _range_size; cctz::time_zone* _ctz; std::vector<RowRange> _skipped_row_ranges; + std::unique_ptr<BufferedReader> _file_reader; Review Comment: Is this reader only for reading meta? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org