rdblue commented on code in PR #12634:
URL: https://github.com/apache/iceberg/pull/12634#discussion_r2288794783
##########
parquet/src/main/java/org/apache/iceberg/parquet/PruneColumns.java:
##########
@@ -90,11 +90,11 @@ public Type struct(StructType expected, GroupType struct,
List<Type> fields) {
Type originalField = struct.getType(i);
Type field = fields.get(i);
Integer fieldId = getId(originalField);
- if (fieldId != null && selectedIds.contains(fieldId)) {
- filteredFields.add(originalField);
- } else if (field != null) {
- filteredFields.add(originalField);
+ if (field != null) {
+ filteredFields.add(field);
hasChange = true;
+ } else if (fieldId != null && selectedIds.contains(fieldId)) {
Review Comment:
I don't think that the order should change. If the field itself is selected
by ID then the entire field should be projected. For instance, you could have a
case where multiple fields are referenced:
```sql
SELECT response FROM http_log WHERE response.status = 200
```
That references `response.status` specifically, but the entire `response`
struct should be produced.
I think I do agree with the secondary case where if the field was returned
from traversing the schema, then at least one sub-field was projected and the
proejcted version should be returned.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]