sririshindra commented on code in PR #7096:
URL: https://github.com/apache/iceberg/pull/7096#discussion_r1141210980


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##########
@@ -123,10 +124,10 @@ public class DeleteOrphanFilesSparkAction extends 
BaseSparkAction<DeleteOrphanFi
   private ExecutorService deleteExecutorService = null;
 
   DeleteOrphanFilesSparkAction(SparkSession spark, Table table) {
-    super(spark);
-
-    this.hadoopConf = new 
SerializableConfiguration(spark.sessionState().newHadoopConf());
-    this.listingParallelism = 
spark.sessionState().conf().parallelPartitionDiscoveryParallelism();
+    super(spark.cloneSession());
+    spark().conf().set(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD().key(), -1);

Review Comment:
   I am a bit confused. Why are we hard coding this config here? Even if a user 
desires to use broadcastHashJoin for DeleteOrphanFilesSparkAction, any 
configuration they set up while initializing their SparkSession to force a 
broadcast join would be ignored, making it impossible to use.. 
   
   For instance a user might want to set 
`spark().conf().set(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD().key(), 100MB)` so 
that their code doesn't OOM out or maybe their jobs are small enough that they 
don't see the oom issue at all. Here we are simply overriding whatever user 
might have configured for their job and forcing them to use always use 
SortMergeJoin.  
   
   If the user wants to disable broadcastJoin entirely for 
DeleteOrphanFilesSparkAction, they can do so in their spark config themselves. 
I think we shouldn't take away that choice from the user. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to