Re: [PR] Parquet: Fix column pruning for deeply nested fields [iceberg]

via GitHub Wed, 20 Aug 2025 17:07:07 -0700


sriharshaj commented on code in PR #12634:
URL: https://github.com/apache/iceberg/pull/12634#discussion_r2289426616



##########
parquet/src/main/java/org/apache/iceberg/parquet/PruneColumns.java:
##########
@@ -90,11 +90,11 @@ public Type struct(StructType expected, GroupType struct, 
List<Type> fields) {
       Type originalField = struct.getType(i);
       Type field = fields.get(i);
       Integer fieldId = getId(originalField);
-      if (fieldId != null && selectedIds.contains(fieldId)) {
-        filteredFields.add(originalField);
-      } else if (field != null) {
-        filteredFields.add(originalField);
+      if (field != null) {
+        filteredFields.add(field);
         hasChange = true;
+      } else if (fieldId != null && selectedIds.contains(fieldId)) {

Review Comment:
   When we select the entire struct column like:
   ```
   SELECT response FROM http_log
   ```
   `response` is expanded into all of its nested fields, and add every nested 
field’s id to the selected id list.
   
   By the time we reach this code path, `field` already matches `originalField`.
   
   So in the case you mentioned, we will return correct schema.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Parquet: Fix column pruning for deeply nested fields [iceberg]

Reply via email to