sriharshaj commented on code in PR #12634:
URL: https://github.com/apache/iceberg/pull/12634#discussion_r2353883644
##########
parquet/src/main/java/org/apache/iceberg/parquet/PruneColumns.java:
##########
@@ -90,11 +90,11 @@ public Type struct(StructType expected, GroupType struct,
List<Type> fields) {
Type originalField = struct.getType(i);
Type field = fields.get(i);
Integer fieldId = getId(originalField);
- if (fieldId != null && selectedIds.contains(fieldId)) {
- filteredFields.add(originalField);
- } else if (field != null) {
- filteredFields.add(originalField);
+ if (field != null) {
+ filteredFields.add(field);
hasChange = true;
+ } else if (fieldId != null && selectedIds.contains(fieldId)) {
Review Comment:
@rdblue
From my reading of the code, setting `includeStructIds` to false doesn’t
consistently add the struct field ID to `selectedIds`. As a result, `(fieldId
!= null && selectedIds.contains(fieldId))` may never evaluate to true.
Also, given the current implementation, when we only look at
`expectedSchema` in:
```
TypeUtil.getProjectedIds(expectedSchema);
```
it seems we can’t determine whether a subset of struct fields is selected or
the entire struct is selected, since there’s no table schema to diff against.
In addition, even if only one nested field of a struct is selected, it looks
like `org.apache.iceberg.Schema` can’t be built without the parent struct field
ID.
One possible way to distinguish "entire struct selected" from "subset
selected" would be to omit the struct’s nested fields from the `Schema` when
the full struct is selected.
Please let me know if I am missing anything.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]