amogh-jahagirdar commented on code in PR #12634:
URL: https://github.com/apache/iceberg/pull/12634#discussion_r2017066766


##########
parquet/src/test/java/org/apache/iceberg/parquet/TestPruneColumns.java:
##########
@@ -248,7 +269,16 @@ public void testStructElementName() {
                 StructType.of(
                     NestedField.required(4, "y", DoubleType.get()),
                     NestedField.required(5, "z", DoubleType.get()))),
-            NestedField.optional(6, "struct_name_2", StructType.of()));
+            NestedField.optional(6, "struct_name_2", StructType.of()),
+            NestedField.optional(
+                10,
+                "struct_name_3",
+                StructType.of(
+                    NestedField.required(11, "x", DoubleType.get()),
+                    NestedField.optional(
+                        12,
+                        "nested_struct_1",
+                        StructType.of(NestedField.required(13, "y", 
BinaryType.get()))))));

Review Comment:
   Since this structure is a bit deeply nested, could you add/update the above 
comment to indicate which fields we expect to be projected?  



##########
parquet/src/main/java/org/apache/iceberg/parquet/PruneColumns.java:
##########
@@ -90,11 +90,11 @@ public Type struct(StructType expected, GroupType struct, 
List<Type> fields) {
       Type originalField = struct.getType(i);
       Type field = fields.get(i);
       Integer fieldId = getId(originalField);
-      if (fieldId != null && selectedIds.contains(fieldId)) {
-        filteredFields.add(originalField);

Review Comment:
   My thoughts on why this is correct:
   
   This check being performed first prevented column pruning on deeply nested 
fields because `selectedIds` contains the IDs of structs along with any further 
nested fields being projected. This check would pass first and we would just 
add the whole `originalField` which includes fields that may not be needed to 
read to satisfy a read.  
   
   In the change, by checking `field` is not null first we're checking the 
pruned fields of the struct since this visitor works in a preorder fashion 
where a structure is visited and then those contents are passed here in that 
List<Type>. If a pruned field exists then we add that field (and not the 
original field since that would not be pruned). 
   
   OK I think I'm convinced that this change is correct but I'd like @rdblue 
take on this. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to