rdblue commented on code in PR #6965:
URL: https://github.com/apache/iceberg/pull/6965#discussion_r1126915294
##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/BaseReader.java:
##########
@@ -252,7 +259,7 @@ protected class SparkDeleteFilter extends
DeleteFilter<InternalRow> {
private final InternalRowWrapper asStructLike;
SparkDeleteFilter(String filePath, List<DeleteFile> deletes, DeleteCounter
counter) {
- super(filePath, deletes, table.schema(), expectedSchema, counter);
+ super(filePath, deletes, SnapshotUtil.snapshotSchema(table, branch),
expectedSchema, counter);
Review Comment:
I did a bit of exploration to find out if we could avoid passing branch all
the way through the many classes to get the schema here. We could pass the
table schema instead, but that doesn't seem worth the trouble since we still
have to modify so many classes.
Another option is to add any missing columns to the read schema, so that it
doesn't need to happen in each task. The main issue with that is that the
columns need to be based on the delete files, so we would need to plan the scan
before we could return the read schema.
I think that what this already does is actually the cleanest solution, but I
thought I'd mention that I looked into it in case other people were wondering
about all the changes to pass `branch`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]