morningman commented on code in PR #12570:
URL: https://github.com/apache/doris/pull/12570#discussion_r969797638


##########
be/src/io/hdfs_file_reader.cpp:
##########
@@ -147,6 +147,7 @@ Status HdfsFileReader::readat(int64_t position, int64_t 
nbytes, int64_t* bytes_r
                                          BackendOptions::get_localhost(), 
_namenode, _path,
                                          hdfsGetLastError());
         }
+        _current_offset = position;

Review Comment:
   Directly call `HdfsFileReader::seek() here`, to remove duplicate code



##########
be/src/vec/exec/format/parquet/vparquet_page_reader.cpp:
##########
@@ -38,7 +36,7 @@ Status PageReader::next_page_header() {
 
     const uint8_t* page_header_buf = nullptr;
     size_t max_size = _end_offset - _offset;
-    size_t header_size = std::min(initPageHeaderSize, max_size);
+    size_t header_size = 128;

Review Comment:
   Do not use this magic number. 
   why remove `initPageHeaderSize`?



##########
be/src/vec/exec/format/parquet/vparquet_group_reader.cpp:
##########
@@ -50,7 +50,20 @@ Status RowGroupReader::_init_column_readers(const 
FieldDescriptor& schema,
         TypeDescriptor col_type = slot_desc->type();
         auto field = 
const_cast<FieldSchema*>(schema.get_column(slot_desc->col_name()));
         std::unique_ptr<ParquetColumnReader> reader;
-        RETURN_IF_ERROR(ParquetColumnReader::create(_file_reader, field, 
read_col, _row_group_meta,
+        FileReader* buff_reader;
+        if (_buffered_file_reader.size() < 
config::parquet_group_pooled_reader) {

Review Comment:
   I think it is too many readers for a single row group.



##########
be/src/vec/exec/format/parquet/vparquet_reader.h:
##########
@@ -130,6 +128,7 @@ class ParquetReader {
     int64_t _range_size;
     cctz::time_zone* _ctz;
     std::vector<RowRange> _skipped_row_ranges;
+    std::unique_ptr<BufferedReader> _file_reader;

Review Comment:
   Is this reader only for reading meta?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to