amogh-jahagirdar commented on code in PR #12634:
URL: https://github.com/apache/iceberg/pull/12634#discussion_r2289600342
##########
parquet/src/main/java/org/apache/iceberg/parquet/PruneColumns.java:
##########
@@ -90,11 +90,11 @@ public Type struct(StructType expected, GroupType struct,
List<Type> fields) {
Type originalField = struct.getType(i);
Type field = fields.get(i);
Integer fieldId = getId(originalField);
- if (fieldId != null && selectedIds.contains(fieldId)) {
- filteredFields.add(originalField);
- } else if (field != null) {
- filteredFields.add(originalField);
+ if (field != null) {
+ filteredFields.add(field);
hasChange = true;
+ } else if (fieldId != null && selectedIds.contains(fieldId)) {
Review Comment:
Right @sriharshaj,
so @rdblue given
```
SELECT response FROM http_log WHERE response.status = 200
```
`selectedIds` ends up including the ID from the top level struct and all of
it's nested fields. As a result, I think we do first want to check anything
from the pruned set of fields. If that's defined AND the whole struct is
projected that ends up working anyways because the set of fields for the struct
already includes everything from the preorder traversal (i.e. the pruning
didn't prune anything since there's nothing to prune from `response`).
This relies on the fact that `selectedIds` includes the fields from the top
level struct and it's nested fields. I think this is fine because the only
interface to PruneColumns is ParquetSchemaUtils.pruneColumns which always
includes all these fields anyways in the implementation.
I also wrote a quick test (albeit via Spark) to prove it but this logic is
all embedded in our parquet APIs anyways so it applies to any integration using
those.
```
@TestTemplate
public void testStructProjection() {
assumeThat(fileFormat).isEqualTo(FileFormat.PARQUET);
sql("CREATE TABLE %s (response struct<code:int, body:string>) using
iceberg", tableName);
sql("insert into %s values named_struct(\"code\", 200, \"body\",
\"success\")", tableName);
sql("insert into %s values named_struct(\"code\", 503, \"body\",
\"because istio\")", tableName);
List<Object[]> results = sql("SELECT response from %s where
response.code > 500", tableName);
assertThat(results).hasSize(1);
assertThat(results.get(0)[0]).isEqualTo(new Object[]{503, "because
istio"});
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]