[GitHub] [doris] morningman commented on a diff in pull request #15836: [feature wip](multi catalog)Support iceberg schema evolution.

GitBox Wed, 18 Jan 2023 23:54:07 -0800


morningman commented on code in PR #15836:
URL: https://github.com/apache/doris/pull/15836#discussion_r1080893973



##########
be/src/vec/exec/scan/vfile_scanner.h:
##########
@@ -66,14 +66,17 @@ class VFileScanner : public VScanner {
     std::unique_ptr<GenericReader> _cur_reader;
     bool _cur_reader_eof;
     std::unordered_map<std::string, ColumnValueRangeType>* 
_colname_to_value_range;
+    std::unordered_map<std::string, ColumnValueRangeType> 
_new_colname_to_value_range;

Review Comment:
   Add comment for this field



##########
be/src/vec/exec/scan/vfile_scanner.cpp:
##########
@@ -491,20 +491,34 @@ Status VFileScanner::_get_next_reader() {
             ParquetReader* parquet_reader = new ParquetReader(
                     _profile, _params, range, 
_state->query_options().batch_size,
                     const_cast<cctz::time_zone*>(&_state->timezone_obj()), 
_io_ctx.get());
+            parquet_reader->open();

Review Comment:
   ```suggestion
               RETURN_IF_ERROR(parquet_reader->open());
   ```



##########
be/src/vec/exec/format/table/iceberg_reader.cpp:
##########
@@ -60,7 +60,28 @@ IcebergTableReader::IcebergTableReader(GenericReader* 
file_format_reader, Runtim
 }
 
 Status IcebergTableReader::get_next_block(Block* block, size_t* read_rows, 
bool* eof) {
-    return _file_format_reader->get_next_block(block, read_rows, eof);
+    // To support iceberg schema evolution. We change the column name in block 
to
+    // make it match with the column name in parquet file before reading data. 
and
+    // Set the name back to table column name before return this block.
+    for (int i = 0; i < block->columns(); i++) {

Review Comment:
   If there is no schema change, we can skip this `for` loop to speed up query



##########
be/src/vec/exec/scan/vfile_scanner.cpp:
##########
@@ -491,20 +491,34 @@ Status VFileScanner::_get_next_reader() {
             ParquetReader* parquet_reader = new ParquetReader(
                     _profile, _params, range, 
_state->query_options().batch_size,
                     const_cast<cctz::time_zone*>(&_state->timezone_obj()), 
_io_ctx.get());
+            parquet_reader->open();
             if (!_is_load && _push_down_expr == nullptr && _vconjunct_ctx != 
nullptr) {
                 RETURN_IF_ERROR(_vconjunct_ctx->clone(_state, 
&_push_down_expr));
                 _discard_conjuncts();
             }
-            init_status = parquet_reader->init_reader(_file_col_names, 
_colname_to_value_range,
-                                                      _push_down_expr);
             if (range.__isset.table_format_params &&
                 range.table_format_params.table_format_type == "iceberg") {
+                _table_col_to_file_col.clear();

Review Comment:
   I think these logics can be put into `iceberg reader`? Because it is only 
used for iceberg?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

[GitHub] [doris] morningman commented on a diff in pull request #15836: [feature wip](multi catalog)Support iceberg schema evolution.

Reply via email to