TechTinkerer42 commented on issue #9330: URL: https://github.com/apache/iceberg/issues/9330#issuecomment-1868957150
Each partition should be written by only one task to prevent multiple tasks from writing to the same partition, which can lead to the creation of small files. Here are a few options you can try: Enable AQE (Adaptive Query Execution) and allow partition splitting for large partitions by setting the following: spark.sql.adaptive.enabled spark.sql.adaptive.optimizeSkewsInRebalancePartitions.enabled spark.sql.adaptive.advisoryPartitionSizeInBytes Another method is to set the write.distribution-mode to none on the table. However, be aware that this could result in the creation of small files and potentially degrade query performance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org