Re: [PR] Parquet: Fix column pruning for deeply nested fields [iceberg]

via GitHub Tue, 19 Aug 2025 21:56:21 -0700


amogh-jahagirdar commented on code in PR #12634:
URL: https://github.com/apache/iceberg/pull/12634#discussion_r2286965917



##########
parquet/src/main/java/org/apache/iceberg/parquet/PruneColumns.java:
##########
@@ -90,11 +90,11 @@ public Type struct(StructType expected, GroupType struct, 
List<Type> fields) {
       Type originalField = struct.getType(i);
       Type field = fields.get(i);
       Integer fieldId = getId(originalField);
-      if (fieldId != null && selectedIds.contains(fieldId)) {
-        filteredFields.add(originalField);

Review Comment:
   My thoughts on why this is correct:
   
   Prior to this PR, the check being performed first prevented column pruning 
on deeply nested fields because `selectedIds` contains the IDs of structs along 
with any further nested fields being projected. This check would pass first and 
we would just add the whole `originalField` which includes fields that may not 
be needed to read to satisfy a read.  
   
   In the change, by checking `field` is not null first we're checking the 
pruned fields of the struct since this visitor works in a preorder fashion 
where a structure is visited and then those contents are passed here in that 
List<Type>. If a pruned field exists then we add that field (and not the 
original field since that would not be pruned). 
   
   OK I think I'm convinced that this change is correct but I'd like @rdblue 
take on this. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Parquet: Fix column pruning for deeply nested fields [iceberg]

Reply via email to