vshel opened a new issue, #5997: URL: https://github.com/apache/iceberg/issues/5997
### Query engine Spark3 ### Question Hello, I have a ~6TB iceberg table with ~10,000 partitions within S3 and I am using Glue catalog, what is the correct way of running compaction on such a table? From documentation: https://iceberg.apache.org/docs/latest/maintenance/ I can run: ``` SparkActions .get() .rewriteDataFiles(table) .filter(Expressions.equal("date", "2020-08-18")) .option("target-file-size-bytes", Long.toString(500 * 1024 * 1024)) // 500 MB .execute(); ``` This is going to execute on a single aws instance, how do I scale this to many instances for the compaction process to run in parallel on many partitions at once, is there an out of the box support for this? Additionally, the table is constantly updated, am I supposed to pause all updates until compaction finishes? Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org