BsoBird commented on code in PR #9546:
URL: https://github.com/apache/iceberg/pull/9546#discussion_r1593419915


##########
core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java:
##########
@@ -154,21 +155,33 @@ public void commit(TableMetadata base, TableMetadata 
metadata) {
     Path tempMetadataFile = metadataPath(UUID.randomUUID().toString() + 
fileExtension);
     TableMetadataParser.write(metadata, 
io().newOutputFile(tempMetadataFile.toString()));
 
+    // todo:What should we do if version variable overflows?
     int nextVersion = (current.first() != null ? current.first() : 0) + 1;
     Path finalMetadataFile = metadataFilePath(nextVersion, codec);
     FileSystem fs = getFileSystem(tempMetadataFile, conf);
+    boolean versionCommitSuccess = false;
+    try{
+      // this rename operation is the atomic commit operation
+      renameToFinal(fs, tempMetadataFile, finalMetadataFile, nextVersion);
 
-    // this rename operation is the atomic commit operation
-    renameToFinal(fs, tempMetadataFile, finalMetadataFile, nextVersion);
+      LOG.info("Committed a new metadata file {}", finalMetadataFile);
 
-    LOG.info("Committed a new metadata file {}", finalMetadataFile);
+      // update the best-effort version pointer
+      versionCommitSuccess = writeVersionHint(nextVersion);
 
-    // update the best-effort version pointer
-    writeVersionHint(nextVersion);
+      deleteRemovedMetadataFiles(base, metadata);
 
-    deleteRemovedMetadataFiles(base, metadata);
-
-    this.shouldRefresh = true;
+      this.shouldRefresh = true;
+    }catch (Throwable e){
+      // If the versionHint has been submitted successfully, then we don't 
need to clean up the data file if the task fails.
+      // This is achieved by throwing the CommitStateUnknownException 
exception(BaseRewriteDataFilesAction::replaceDataFiles)
+      if(versionCommitSuccess && !this.shouldRefresh){
+        this.shouldRefresh = true;
+      }
+      throw versionCommitSuccess

Review Comment:
   > Ok i looked through the code of version hint file and now I think this is 
probably right as written but only for the Java implementation. Our Current 
java implementation is actually violating the spec here
   > 
   > https://iceberg.apache.org/spec/#file-system-tables
   > 
   > ```
   > Each version of table metadata is stored in a metadata folder under the 
table’s base location using a file naming scheme that includes a version 
number, V: v<V>.metadata.json. To commit a new metadata version, V+1, the 
writer performs the following steps:
   > 
   > Read the current table metadata version V.
   > Create new table metadata based on version V.
   > Write the new table metadata to a unique file: <random-uuid>.metadata.json.
   > Rename the unique file to the well-known file for version V: 
v<V+1>.metadata.json.
   > If the rename succeeds, the commit succeeded and V+1 is the table’s 
current version
   > If the rename fails, go back to step 1.
   > ```
   > 
   > Because we are ignoring a newer file if a version hint exists but points 
to an earlier file ... I think to fix this we would have to always check two 
files when looking at the version hint. First the file specified by the version 
hint, then the file +1. IF the File+1 exists then the version hint must be 
discarded by the reader and it must look for the newest possible 
metadata.json....
   
   Maybe. we should not use versionHint file........



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to