sumedhsakdeo opened a new pull request, #10336: URL: https://github.com/apache/iceberg/pull/10336
We have a scheduled job that deletes rows in an Iceberg table. The job is authored in SQL. Given we use CoW technique for data deletion the job would rewrite the files without the deleted rows. We want to tune this job so that it creates files that are ~512MB on HDFS. We are unable to use option given job uses SparkSQL and setting `read.split.target-size` table property is not desired as it impacts are readers. PR adds ability to control the split-size for a given spark SQL job, by introducing a property `spark.sql.iceberg.split-size` which can be set as spark session conf. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org