paulpaul1076 opened a new issue, #9687: URL: https://github.com/apache/iceberg/issues/9687
### Feature Request / Improvement From what I understand, if a compaction job compacts a lot of small files, and uses `partial-progress.enabled=true`, there are situations, when file groups get done being compacted at the same time, and then there are parallel commits of metadata being made, and they conflict with one another which leads to `CommitFailedException`. Is it possible to make these commits sequential, instead of parallel for the compaction job specifically? I don't think there's any point in them being parallel, it just leads to `CommitFailedException`. It's very easy to reproduce, for example, we can set `partial-progress.max-commit=1000` (or another high number) and make `max-file-group-size-bytes=1gb` (or some other low number). You will see that there are a lot of file groups and they all get committed in parallel which leads to this exception. That was a synthetic example. As for a real world example, I had a job that had to compact 50k files. For that job, I used `partial-progress.enabled=true` and it worked fine, because there were a lot of file groups and since `partial-progress.max-commits=10` was small (default), the commits were infrequent and didn't run in parallel for the most part. However, when I ran that same compaction job on a table with 5k files, for that jobs 10 commits is very frequent, because it doesn't have a lot of files to compact, and these commits start running in parallel and conflicting. This leads to having to configure these settings separately, depending on the number of files we have to compact. So, why not just make commits sequential (inside the compaction job only). @RussellSpitzer told me that he thought that this had been done, then we realized that it seems that it hasn't been done, and I didn't find any issue here in github about it. Is somebody already working on this? ### Query engine Spark -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org