Re: [PR] Parquet: Fix column pruning for deeply nested fields [iceberg]

via GitHub Wed, 20 Aug 2025 11:49:14 -0700


rdblue commented on code in PR #12634:
URL: https://github.com/apache/iceberg/pull/12634#discussion_r2288794783



##########
parquet/src/main/java/org/apache/iceberg/parquet/PruneColumns.java:
##########
@@ -90,11 +90,11 @@ public Type struct(StructType expected, GroupType struct, 
List<Type> fields) {
       Type originalField = struct.getType(i);
       Type field = fields.get(i);
       Integer fieldId = getId(originalField);
-      if (fieldId != null && selectedIds.contains(fieldId)) {
-        filteredFields.add(originalField);
-      } else if (field != null) {
-        filteredFields.add(originalField);
+      if (field != null) {
+        filteredFields.add(field);
         hasChange = true;
+      } else if (fieldId != null && selectedIds.contains(fieldId)) {

Review Comment:
   I don't think that the order should change. If the field itself is selected 
by ID then the entire field should be projected. For instance, you could have a 
case where multiple fields are referenced:
   
   ```sql
   SELECT response FROM http_log WHERE response.status = 200
   ```
   
   That references `response.status` specifically, but the entire `response` 
struct should be produced.
   
   I think I do agree with the secondary case where if the field was returned 
from traversing the schema, then at least one sub-field was projected and the 
proejcted version should be returned.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Parquet: Fix column pruning for deeply nested fields [iceberg]

Reply via email to