amogh-jahagirdar commented on code in PR #12634:
URL: https://github.com/apache/iceberg/pull/12634#discussion_r2286965917
##########
parquet/src/main/java/org/apache/iceberg/parquet/PruneColumns.java:
##########
@@ -90,11 +90,11 @@ public Type struct(StructType expected, GroupType struct,
List<Type> fields) {
Type originalField = struct.getType(i);
Type field = fields.get(i);
Integer fieldId = getId(originalField);
- if (fieldId != null && selectedIds.contains(fieldId)) {
- filteredFields.add(originalField);
Review Comment:
My thoughts on why this is correct:
Prior to this PR, the check being performed first prevented column pruning
on deeply nested fields because `selectedIds` contains the IDs of structs along
with any further nested fields being projected. This check would pass first and
we would just add the whole `originalField` which includes fields that may not
be needed to read to satisfy a read.
In the change, by checking `field` is not null first we're checking the
pruned fields of the struct since this visitor works in a preorder fashion
where a structure is visited and then those contents are passed here in that
List<Type>. If a pruned field exists then we add that field (and not the
original field since that would not be pruned).
OK I think I'm convinced that this change is correct but I'd like @rdblue
take on this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]