aokolnychyi commented on code in PR #11273:
URL: https://github.com/apache/iceberg/pull/11273#discussion_r1799857526


##########
spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkPositionDeltaWrite.java:
##########
@@ -169,7 +174,13 @@ public DeltaWriterFactory 
createBatchWriterFactory(PhysicalWriteInfo info) {
       // broadcast the table metadata as the writer factory will be sent to 
executors
       Broadcast<Table> tableBroadcast =
           sparkContext.broadcast(SerializableTableWithSize.copyOf(table));
-      return new PositionDeltaWriteFactory(tableBroadcast, command, context, 
writeProperties);
+      Broadcast<Map<String, DeleteFileSet>> rewritableDeletes = null;
+      if (context.deleteGranularity() == DeleteGranularity.FILE && scan != 
null) {
+        rewritableDeletes = sparkContext.broadcast(scan.rewritableDeletes());

Review Comment:
   We should avoid the broadcast if the set of rewritable deletes is 
empty/null. I'd also move this into a helper method and modify the 
comment/invocation above for consistency.
   
   ```
   @Override
   public DeltaWriterFactory createBatchWriterFactory(PhysicalWriteInfo info) {
     // broadcast large objects as the writer factory will be sent to executors
     return new PositionDeltaWriteFactory(
         sparkContext.broadcast(SerializableTableWithSize.copyOf(table)),
         broadcastRewritableDeletes(),
         ...
   }
   
   private Broadcast<Map<String, DeleteFileSet>> broadcastRewritableDeletes() {
     if (context.deleteGranularity() == DeleteGranularity.FILE && scan != null) 
{
       Map<String, DeleteFileSet> rewritableDeletes = scan.rewritableDeletes();
       if (rewritableDeletes != null && !rewritableDeletes.isEmpty()) {
         return sparkContext.broadcast(rewritableDeletes);
       }
     }
   
     return null;
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to