[GitHub] [iceberg] rdblue commented on a diff in pull request #6965: Spark 3.3: Support write to branch through table identifier

via GitHub Mon, 06 Mar 2023 11:15:18 -0800


rdblue commented on code in PR #6965:
URL: https://github.com/apache/iceberg/pull/6965#discussion_r1126915294



##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/source/BaseReader.java:
##########
@@ -252,7 +259,7 @@ protected class SparkDeleteFilter extends 
DeleteFilter<InternalRow> {
     private final InternalRowWrapper asStructLike;
 
     SparkDeleteFilter(String filePath, List<DeleteFile> deletes, DeleteCounter 
counter) {
-      super(filePath, deletes, table.schema(), expectedSchema, counter);
+      super(filePath, deletes, SnapshotUtil.snapshotSchema(table, branch), 
expectedSchema, counter);

Review Comment:
   I did a bit of exploration to find out if we could avoid passing branch all 
the way through the many classes to get the schema here. We could pass the 
table schema instead, but that doesn't seem worth the trouble since we still 
have to modify so many classes.
   
   Another option is to add any missing columns to the read schema, so that it 
doesn't need to happen in each task. The main issue with that is that the 
columns need to be based on the delete files, so we would need to plan the scan 
before we could return the read schema.
   
   I think that what this already does is actually the cleanest solution, but I 
thought I'd mention that I looked into it in case other people were wondering 
about all the changes to pass `branch`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a diff in pull request #6965: Spark 3.3: Support write to branch through table identifier

Reply via email to