flyrain commented on code in PR #6378:
URL: https://github.com/apache/iceberg/pull/6378#discussion_r1063724968


##########
core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java:
##########
@@ -222,28 +222,42 @@ public List<RewriteFileGroup> results() {
     public void close() {
       Preconditions.checkState(
           running.compareAndSet(true, false), "Cannot close already closed 
RewriteService");
-      LOG.info("Closing commit service for {}", table);
+      LOG.info("Closing commit service for {} waiting for all commits to 
finish", table);
       committerService.shutdown();
 
+      boolean timeout = false;
       try {
         // All rewrites have completed and all new files have been created, we 
are now waiting for
         // the commit
-        // pool to finish doing it's commits to Iceberg State. In the case of 
partial progress this
+        // pool to finish doing its commits to Iceberg State. In the case of 
partial progress this
         // should
         // have been occurring simultaneously with rewrites, if not there 
should be only a single
         // commit operation.
-        // In either case this should take much less than 10 minutes to 
actually complete.
-        if (!committerService.awaitTermination(10, TimeUnit.MINUTES)) {
+        if (!committerService.awaitTermination(120, TimeUnit.MINUTES)) {
           LOG.warn(
-              "Commit operation did not complete within 10 minutes of the 
files being written. This may mean "
-                  + "that changes were not successfully committed to the the 
Iceberg table.");
+              "Commit operation did not complete within 120 minutes of the all 
files "
+                  + "being rewritten. This may mean that some changes were not 
successfully committed to the "
+                  + "table.");
+          timeout = true;
         }
       } catch (InterruptedException e) {
         Thread.currentThread().interrupt();
         throw new RuntimeException(
             "Cannot complete commit for rewrite, commit service interrupted", 
e);
       }
 
+      if (!completedRewrites.isEmpty() && timeout) {
+        LOG.error("Attempting to cleanup uncommitted file groups");
+        
completedRewrites.forEach(RewriteDataFilesCommitManager.this::abortFileGroup);
+      }
+
+      Preconditions.checkArgument(
+          !timeout && completedRewrites.isEmpty(),
+          "Timeout occurred when waiting for commits to complete. "
+              + "{} file groups committed. {} file groups remain uncommitted.",

Review Comment:
   Can we mention that a solution for uncommitted file groups? like rerun the 
compaction or wait for the next cycle. I feel like we have enough context as a 
developer, but user may not, and get panic when job failed.



##########
core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java:
##########
@@ -225,25 +225,40 @@ public void close() {
       LOG.info("Closing commit service for {}", table);
       committerService.shutdown();
 
+      boolean timeout = false;
+      int waitTime;
       try {
         // All rewrites have completed and all new files have been created, we 
are now waiting for
         // the commit
-        // pool to finish doing it's commits to Iceberg State. In the case of 
partial progress this
+        // pool to finish doing its commits to Iceberg State. In the case of 
partial progress this
         // should
         // have been occurring simultaneously with rewrites, if not there 
should be only a single
         // commit operation.
-        // In either case this should take much less than 10 minutes to 
actually complete.
-        if (!committerService.awaitTermination(10, TimeUnit.MINUTES)) {
+        // We will wait 10 minutes plus 5 more minutes for each commit left to 
perform due to the
+        // time required for writing manifests
+        waitTime = 10 + (completedRewrites.size() / rewritesPerCommit) * 5;

Review Comment:
   Can we change it to a longer duration, like 15 mins per commit? In that way, 
the waiting time increases proportionally instead of a fixed 120-mins. The 
question we are trying to answer is that, is 120 mins long enough?



##########
core/src/main/java/org/apache/iceberg/actions/RewriteDataFilesCommitManager.java:
##########
@@ -222,28 +222,42 @@ public List<RewriteFileGroup> results() {
     public void close() {
       Preconditions.checkState(
           running.compareAndSet(true, false), "Cannot close already closed 
RewriteService");
-      LOG.info("Closing commit service for {}", table);
+      LOG.info("Closing commit service for {} waiting for all commits to 
finish", table);
       committerService.shutdown();
 
+      boolean timeout = false;
       try {
         // All rewrites have completed and all new files have been created, we 
are now waiting for
         // the commit
-        // pool to finish doing it's commits to Iceberg State. In the case of 
partial progress this
+        // pool to finish doing its commits to Iceberg State. In the case of 
partial progress this

Review Comment:
   not related to this PR, I think we can combine the line 231 and 232, as well 
as line 233 and 234.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to