BsoBird commented on code in PR #9546:
URL: https://github.com/apache/iceberg/pull/9546#discussion_r1482554243


##########
core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java:
##########
@@ -289,64 +377,153 @@ Path versionHintFile() {
     return metadataPath(Util.VERSION_HINT_FILENAME);
   }
 
-  private void writeVersionHint(int versionToWrite) {
+  @VisibleForTesting
+  boolean writeVersionHint(FileSystem fs, Integer versionToWrite) throws 
Exception {
     Path versionHintFile = versionHintFile();
-    FileSystem fs = getFileSystem(versionHintFile, conf);
-
+    Path tempVersionHintFile = metadataPath(UUID.randomUUID() + 
"-version-hint.temp");
     try {
-      Path tempVersionHintFile = metadataPath(UUID.randomUUID().toString() + 
"-version-hint.temp");
       writeVersionToPath(fs, tempVersionHintFile, versionToWrite);
-      fs.delete(versionHintFile, false /* recursive delete */);
-      fs.rename(tempVersionHintFile, versionHintFile);
-    } catch (IOException e) {
-      LOG.warn("Failed to update version hint", e);
+      // We can accept that version Hint fails to write, but we can't accept 
that version Hint
+      // writes the wrong version.
+      if (fs.exists(versionHintFile) || findVersionWithOutVersionHint(fs) != 
versionToWrite) {
+        throw new AlreadyExistsException(

Review Comment:
   @RussellSpitzer 
   If the versionHint comes up, it means that another client is committing. 
Even if I don't block it, there's a good chance that fs.rename will get an 
error. This is because the v+1 version has been committed by another client. 
   
   Also, we need to check to see if a new version has been committed before we 
commit. If we don't do this, then we have a situation where client B commits 
version V+2, but has some problems writing to versionHint, and the operation is 
slow. Client A doesn't have this problem, it's committing V+1. And the 
versionHint is written successfully. Eventually, versionHint < maxVersion.
   ```
   client A ------------> COMMIT V+1 ---> COMMIT V+2 ---> TTL CLEAN V+1--> 
COMMIT V+3-----------> WRITE TEMP VERSION HINT ---------------->RENAME VERSION 
HINT
                     |                                                          
                                                |
                     |                                                          
                                                |
                     |                                                          
                                                |
                     |                                                          
                                                |
   CLIENT B --> READ VERSION -----> COMMIT V+1------>WRITE TEMP VERSION 
HINT-------------------------------------------->RENAME VERSION HINT  
   
   final: maxVersion = v+3    versionHint = v+1      
   ```
   
   
   At present, there is only one situation left that we have not resolved:
   ```
   client A --> COMMIT V+1 --------------> WRITE TEMP VERSION HINT 
---------------->RENAME VERSION HINT-------------
                                   |                                            
                         |
                                   |                                            
                         |
                                   |                                            
                         |
                                   |                                            
                         |
   CLIENT B ----------------> READ VERSION -----> COMMIT V+2------>WRITE TEMP 
VERSION HINT------>RENAME VERSION HINT (FAILED) 
   
   
   final: maxVersion = v+2     versionHint = v+1            
   ```
   
   In this case, although I can't stop it from happening at the moment, all 
commits will fail if there is such a problem. But no data will be lost. When 
the user resubmits the task, I will clean up the incorrect versionHint.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to