Fokko commented on code in PR #9546: URL: https://github.com/apache/iceberg/pull/9546#discussion_r1650701857
########## core/src/main/java/org/apache/iceberg/hadoop/HadoopTableOperations.java: ########## @@ -159,18 +160,30 @@ public void commit(TableMetadata base, TableMetadata metadata) { int nextVersion = (current.first() != null ? current.first() : 0) + 1; Path finalMetadataFile = metadataFilePath(nextVersion, codec); FileSystem fs = getFileSystem(tempMetadataFile, conf); - - // this rename operation is the atomic commit operation - renameToFinal(fs, tempMetadataFile, finalMetadataFile, nextVersion); - - LOG.info("Committed a new metadata file {}", finalMetadataFile); - - // update the best-effort version pointer - writeVersionHint(nextVersion); - - deleteRemovedMetadataFiles(base, metadata); - - this.shouldRefresh = true; + boolean versionCommitSuccess = false; + try { + fs.delete(versionHintFile(), false /* recursive delete*/); Review Comment: > Sir, the purpose of versionHintFile is simply to speed up the reading of the latest version. it‘s just a index file. That is correct, but it is also highly coupled to the Iceberg project since Iceberg optimizes for object stores. Object stores are terrible at listing files since they are often paged responses. These are both slow and costly, therefore the pointer in the `versionHintFile` makes this much faster. > In a file system that is basically posix compliant, we theoretically don't need lockManager to prevent multiple clients from committing concurrently, because fs.rename can do that. How would that work? I think a race condition cannot be avoided, where it would pass 99.99% of the time, but it can be that they overwrite each other because they don't know that multiple processes are writing to the table. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org