xloya commented on code in PR #7096:
URL: https://github.com/apache/iceberg/pull/7096#discussion_r1143246891
##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##########
@@ -123,10 +124,10 @@ public class DeleteOrphanFilesSparkAction extends
BaseSparkAction<DeleteOrphanFi
private ExecutorService deleteExecutorService = null;
DeleteOrphanFilesSparkAction(SparkSession spark, Table table) {
- super(spark);
-
- this.hadoopConf = new
SerializableConfiguration(spark.sessionState().newHadoopConf());
- this.listingParallelism =
spark.sessionState().conf().parallelPartitionDiscoveryParallelism();
+ super(spark.cloneSession());
+ spark().conf().set(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD().key(), -1);
Review Comment:
@sririshindra Hardcoding is really not the most elegant solution. But
compared to the risk of OOM, I think the cost of Sort Merge Join is acceptable.
The reason for this problem is that we scan the metadata first, and Spark uses
the metadata table for estimation, so the most radical solution is to make the
estimation more accurate with Spark.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]