amogh-jahagirdar commented on code in PR #12634:
URL: https://github.com/apache/iceberg/pull/12634#discussion_r2289600342


##########
parquet/src/main/java/org/apache/iceberg/parquet/PruneColumns.java:
##########
@@ -90,11 +90,11 @@ public Type struct(StructType expected, GroupType struct, 
List<Type> fields) {
       Type originalField = struct.getType(i);
       Type field = fields.get(i);
       Integer fieldId = getId(originalField);
-      if (fieldId != null && selectedIds.contains(fieldId)) {
-        filteredFields.add(originalField);
-      } else if (field != null) {
-        filteredFields.add(originalField);
+      if (field != null) {
+        filteredFields.add(field);
         hasChange = true;
+      } else if (fieldId != null && selectedIds.contains(fieldId)) {

Review Comment:
   Right @sriharshaj, 
   
   so @rdblue given
   
   ```
   SELECT response FROM http_log WHERE response.status = 200
   ```
   
    `selectedIds` ends up including the ID from the top level struct and all of 
it's nested fields. As a result, I think we do first want to check anything 
from the pruned set of fields. If that's defined AND the whole struct is 
projected that ends up working anyways because the set of fields for the struct 
already includes everything from the preorder traversal (i.e. the pruning 
didn't prune anything since there's nothing to prune from `response`).
   
   This relies on the fact that `selectedIds` includes the fields from the top 
level struct and it's nested fields. I think this is fine because the only 
interface to PruneColumns is ParquetSchemaUtils.pruneColumns which always 
includes all these fields anyways in the implementation. 
   
   I also wrote a quick test (albeit via Spark) to prove it but this logic is 
all embedded in our parquet APIs anyways so it applies to any integration using 
those.
   
   ```
     @TestTemplate
     public void testStructProjection() {
       assumeThat(fileFormat).isEqualTo(FileFormat.PARQUET);
       sql("CREATE TABLE %s (response struct<code:int, body:string>) using 
iceberg", tableName);
       sql("insert into %s values named_struct(\"code\", 200, \"body\", 
\"success\")", tableName);
       sql("insert into %s values named_struct(\"code\", 503, \"body\", 
\"because istio\")", tableName);
       List<Object[]> results = sql("SELECT response from %s where 
response.code > 500", tableName);
   
       assertThat(results).hasSize(1);
       assertThat(results.get(0)[0]).isEqualTo(new Object[]{503, "because 
istio"});
     }
   ```
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to