morningman commented on code in PR #15836: URL: https://github.com/apache/doris/pull/15836#discussion_r1080893973
########## be/src/vec/exec/scan/vfile_scanner.h: ########## @@ -66,14 +66,17 @@ class VFileScanner : public VScanner { std::unique_ptr<GenericReader> _cur_reader; bool _cur_reader_eof; std::unordered_map<std::string, ColumnValueRangeType>* _colname_to_value_range; + std::unordered_map<std::string, ColumnValueRangeType> _new_colname_to_value_range; Review Comment: Add comment for this field ########## be/src/vec/exec/scan/vfile_scanner.cpp: ########## @@ -491,20 +491,34 @@ Status VFileScanner::_get_next_reader() { ParquetReader* parquet_reader = new ParquetReader( _profile, _params, range, _state->query_options().batch_size, const_cast<cctz::time_zone*>(&_state->timezone_obj()), _io_ctx.get()); + parquet_reader->open(); Review Comment: ```suggestion RETURN_IF_ERROR(parquet_reader->open()); ``` ########## be/src/vec/exec/format/table/iceberg_reader.cpp: ########## @@ -60,7 +60,28 @@ IcebergTableReader::IcebergTableReader(GenericReader* file_format_reader, Runtim } Status IcebergTableReader::get_next_block(Block* block, size_t* read_rows, bool* eof) { - return _file_format_reader->get_next_block(block, read_rows, eof); + // To support iceberg schema evolution. We change the column name in block to + // make it match with the column name in parquet file before reading data. and + // Set the name back to table column name before return this block. + for (int i = 0; i < block->columns(); i++) { Review Comment: If there is no schema change, we can skip this `for` loop to speed up query ########## be/src/vec/exec/scan/vfile_scanner.cpp: ########## @@ -491,20 +491,34 @@ Status VFileScanner::_get_next_reader() { ParquetReader* parquet_reader = new ParquetReader( _profile, _params, range, _state->query_options().batch_size, const_cast<cctz::time_zone*>(&_state->timezone_obj()), _io_ctx.get()); + parquet_reader->open(); if (!_is_load && _push_down_expr == nullptr && _vconjunct_ctx != nullptr) { RETURN_IF_ERROR(_vconjunct_ctx->clone(_state, &_push_down_expr)); _discard_conjuncts(); } - init_status = parquet_reader->init_reader(_file_col_names, _colname_to_value_range, - _push_down_expr); if (range.__isset.table_format_params && range.table_format_params.table_format_type == "iceberg") { + _table_col_to_file_col.clear(); Review Comment: I think these logics can be put into `iceberg reader`? Because it is only used for iceberg? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org