hantangwangd commented on code in PR #10983: URL: https://github.com/apache/iceberg/pull/10983#discussion_r1733265734
########## core/src/main/java/org/apache/iceberg/IncrementalFileCleanup.java: ########## @@ -327,4 +342,34 @@ private Set<String> findFilesToDelete( return filesToDelete; } + + private boolean isSafeToDelete( + ManifestEntry<?> entry, + Map<Long, Long> validSnapshotIdToSequenceNumberMap, + Map<Long, Long> expiredSnapshotIdToSequenceNumberMap) { + if (validSnapshotIdToSequenceNumberMap.containsKey(entry.snapshotId()) + || !expiredSnapshotIdToSequenceNumberMap.containsKey(entry.snapshotId())) { + return false; + } + + // The file in DELETE entry can be deleted if there are no Snapshots older than + // this one + if (validSnapshotIdToSequenceNumberMap.keySet().stream() + .noneMatch(snapshotId -> snapshotId < entry.snapshotId())) { + return true; Review Comment: > Snapshot IDs are guaranteed to be unique but not necessarily monotonically increasing (in fact by default in the reference implementation it's random. Thanks for your reminder, sorry for misleading by `TestTableOperations.newSnapshotId()`. Comparing the `snapshotId` is definitely wrong and I will fix it. > I think you'll just need to check that there's no parent snapshot for each of the snapshots in the valid snapshots map. Checking the parent snapshot may also encounter some problems, for example, we may have already expired some specified snapshots including the parent of this snapshot before. In this case, the inheritance chain between snapshots has been broken. So it seems that we shouldn't rely on it as it may lead in wrong result. For example, if we have a table with actions as follows: ``` table.newAppend().appendFile(FILE_A).commit; // snapshotID1 table.newAppend().appendFile(FILE_B).commit; // snapshotID2 table.newDelete().deleteFile(FILE_A).commit; // snapshotID3 table.newAppend().appendFile(FILE_A2).commit; // snapshotID4 ``` And we have already expired specified snapshot `snapshotID2` before. Now when we trying to expire snapshot `snapshotID3` using the policy of `IncrementalFileCleanup`, `FILE_A` would be wrongly cleaned up if we rely on checking the parent snapshot. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org