Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/24088 )
Change subject: IMPALA-14589: [Part 1] Add support for Iceberg V3 default values ...................................................................... Patch Set 5: (11 comments) Thanks for working on this! http://gerrit.cloudera.org:8080/#/c/24088/5/be/src/exec/file-metadata-utils.cc File be/src/exec/file-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/24088/5/be/src/exec/file-metadata-utils.cc@52 PS5, Line 52: AddIcebergColumns(mem_pool, &template_tuple, slot_descs_written); : : PopulateIcebergDefaults(template_tuple, mem_pool); Shouldn't the order be PopulateIcebergDefaults() then AddIcebergColumns(), to make partition values override default values? I think it only works in the tests because the Parquet data files also contain the partition columns. http://gerrit.cloudera.org:8080/#/c/24088/5/be/src/exec/file-metadata-utils.cc@63 PS5, Line 63: TextConverter text_converter(/* escape_char */ '\\', : scan_node_->hdfs_table()->null_partition_key_value(), : /* check_null */ true, /* strict_mode */ true); Same as L163, we could have a utility method that creates the text converter with the correct parameters. http://gerrit.cloudera.org:8080/#/c/24088/5/be/src/exec/file-metadata-utils.cc@74 PS5, Line 74: if (col_idx >= scan_node_->hdfs_table()->col_descs().size()) continue; Is is possible? If not, we could have a DCHECK instead. http://gerrit.cloudera.org:8080/#/c/24088/5/be/src/exec/file-metadata-utils.cc@86 PS5, Line 86: } Handle the case when WriteSlot() fails. See L197. http://gerrit.cloudera.org:8080/#/c/24088/5/be/src/exec/file-metadata-utils.cc@156 PS5, Line 156: scan_node_->hdfs_table()->col_descs() We should DCHECK that path.front() is < col_descs().size(). http://gerrit.cloudera.org:8080/#/c/24088/5/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/24088/5/be/src/exec/hdfs-scan-node-base.cc@417 PS5, Line 417: HasVirtualColumnInTemplateTuple()) return nullptr; Shouldn't we create a template tuple whene there are columns with default values? But we might only want to create the template tuple if the corresponding column is missing from the data file. http://gerrit.cloudera.org:8080/#/c/24088/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java: http://gerrit.cloudera.org:8080/#/c/24088/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@1062 PS5, Line 1062: checkForNestedDefaultValues Can we have tests for this? It's OK to only have it with a single file format (Parquet), as this check is file-format agnostic. http://gerrit.cloudera.org:8080/#/c/24088/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-v3-default-values.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-v3-default-values.test: http://gerrit.cloudera.org:8080/#/c/24088/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-v3-default-values.test@244 PS5, Line 244: # For now, INSERT operations will use NULL for missing columns. Can we raise an error instead when there's a write-default? As the current behavior is not in line with the expected Iceberg V3 behavior. We should also have tests where we explicitly write NULLs. http://gerrit.cloudera.org:8080/#/c/24088/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-v3-default-values.test@293 PS5, Line 293: # Test that non-partitioned column 'month' would use default if missing : # month has initial-default=12 I don't see that in the results. But I think the NULLs are correct here since they are explicitly written in the data files. http://gerrit.cloudera.org:8080/#/c/24088/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-v3-negative.test File testdata/workloads/functional-query/queries/QueryTest/iceberg-v3-negative.test: http://gerrit.cloudera.org:8080/#/c/24088/5/testdata/workloads/functional-query/queries/QueryTest/iceberg-v3-negative.test@47 PS5, Line 47: SELECT * FROM iceberg_v3_default_value; : ---- RESULTS : 1,-1 : 2,2 : ---- TYPES : INT,INT I think we should just remove these tests here. We already cover them in other .test files. Here we should only collect the "negative tests", i.e. things that are not supported or raise errors. http://gerrit.cloudera.org:8080/#/c/24088/5/tests/common/file_utils.py File tests/common/file_utils.py: http://gerrit.cloudera.org:8080/#/c/24088/5/tests/common/file_utils.py@43 PS5, Line 43: file_format == "orc" or file_format == "parquet" or file_format == "avro" nit: file_format in ["orc", "parquet", "avro"] -- To view, visit http://gerrit.cloudera.org:8080/24088 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9f1be994a336b30b17b17819091417d777a39be9 Gerrit-Change-Number: 24088 Gerrit-PatchSet: 5 Gerrit-Owner: Arnab Karmakar <[email protected]> Gerrit-Reviewer: Arnab Karmakar <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 16 Apr 2026 12:24:11 +0000 Gerrit-HasComments: Yes
