RussellSpitzer opened a new issue, #6367: URL: https://github.com/apache/iceberg/issues/6367
### Apache Iceberg version 1.1.0 (latest release) ### Query engine Spark ### Please describe the bug 🐞 Partial progress currently works in the following psuedo-code ``` Rewrite Job Thread Pool In parallel { rewriteFiles for a partition/fileGroup // Datafiles generated here add result of rewrite to commit queue } Commit Thread { when enough fileGroups have been rewritten perform a commit // Manifests generated at this point in time } Once in parallel has completed { Await Termination of Single Threaded (10 Minutes or die) } ``` See https://github.com/apache/iceberg/blob/f5f79a98b5bead5b976378cc2fc45c9454ac7731/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteDataFilesSparkAction.java#L350-L357 https://github.com/apache/iceberg/blob/f5f79a98b5bead5b976378cc2fc45c9454ac7731/core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java#L179-L188 And https://github.com/apache/iceberg/blob/f5f79a98b5bead5b976378cc2fc45c9454ac7731/core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java#L228-L240 The original assumption here is that 10 minutes after the rewrite has completed we should be finished performing all the commits as the commit phase should be relatively fast and the rewrite phase is long. There are a few issues with this, for some users they may be using a very large cluster for the "parallel" phase allowing them to complete the rewrites quickly but these new files will require a huge amount new metadata which in turns would require a large amount of new manifest files. In one of our internal examples we have a very large partial progress rewrite in 10 parts. The rewrites start finishing all around the same time basically just enqueuing all the commits to then occur in sequence. The timeline looks basically like this (imagine there are only five commit groups): ``` All Rewrites Begin 1/5 of files Rewritten 1st Commit Begins 2/5 of files groups rewritten 3/5 of files groups rewritten 4/5 of files groups rewritten 1st Commit Finishes 2nd Commit Begins 5/5 of files groups rewritten 10 Minute Timer Begins to Finish Commits 2nd Commit Finishes 3rd Commit Begins // Timeout! ``` I think the best way to improve this, and increase throughput of the operation is to move the actual writing of manifests into the parallel portion of the operation. In this case we could probably do this by building our commit groups in the Service's offer method rather than in the service thread itself, the the service thread can just be checking for completed commit groups. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org