BsoBird commented on code in PR #9546: URL: https://github.com/apache/iceberg/pull/9546#discussion_r1593419915
########## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ########## @@ -154,21 +155,33 @@ public void commit(TableMetadata base, TableMetadata metadata) { Path tempMetadataFile = metadataPath(UUID.randomUUID().toString() + fileExtension); TableMetadataParser.write(metadata, io().newOutputFile(tempMetadataFile.toString())); + // todo:What should we do if version variable overflows? int nextVersion = (current.first() != null ? current.first() : 0) + 1; Path finalMetadataFile = metadataFilePath(nextVersion, codec); FileSystem fs = getFileSystem(tempMetadataFile, conf); + boolean versionCommitSuccess = false; + try{ + // this rename operation is the atomic commit operation + renameToFinal(fs, tempMetadataFile, finalMetadataFile, nextVersion); - // this rename operation is the atomic commit operation - renameToFinal(fs, tempMetadataFile, finalMetadataFile, nextVersion); + LOG.info("Committed a new metadata file {}", finalMetadataFile); - LOG.info("Committed a new metadata file {}", finalMetadataFile); + // update the best-effort version pointer + versionCommitSuccess = writeVersionHint(nextVersion); - // update the best-effort version pointer - writeVersionHint(nextVersion); + deleteRemovedMetadataFiles(base, metadata); - deleteRemovedMetadataFiles(base, metadata); - - this.shouldRefresh = true; + this.shouldRefresh = true; + }catch (Throwable e){ + // If the versionHint has been submitted successfully, then we don't need to clean up the data file if the task fails. + // This is achieved by throwing the CommitStateUnknownException exception(BaseRewriteDataFilesAction::replaceDataFiles) + if(versionCommitSuccess && !this.shouldRefresh){ + this.shouldRefresh = true; + } + throw versionCommitSuccess Review Comment: > Ok i looked through the code of version hint file and now I think this is probably right as written but only for the Java implementation. Our Current java implementation is actually violating the spec here > > https://iceberg.apache.org/spec/#file-system-tables > > ``` > Each version of table metadata is stored in a metadata folder under the table’s base location using a file naming scheme that includes a version number, V: v<V>.metadata.json. To commit a new metadata version, V+1, the writer performs the following steps: > > Read the current table metadata version V. > Create new table metadata based on version V. > Write the new table metadata to a unique file: <random-uuid>.metadata.json. > Rename the unique file to the well-known file for version V: v<V+1>.metadata.json. > If the rename succeeds, the commit succeeded and V+1 is the table’s current version > If the rename fails, go back to step 1. > ``` > > Because we are ignoring a newer file if a version hint exists but points to an earlier file ... I think to fix this we would have to always check two files when looking at the version hint. First the file specified by the version hint, then the file +1. IF the File+1 exists then the version hint must be discarded by the reader and it must look for the newest possible metadata.json.... Maybe. we should not use versionHint file........ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org