hantangwangd commented on code in PR #10983: URL: https://github.com/apache/iceberg/pull/10983#discussion_r1733280303
########## core/src/main/java/org/apache/iceberg/IncrementalFileCleanup.java: ########## @@ -327,4 +342,34 @@ private Set<String> findFilesToDelete( return filesToDelete; } + + private boolean isSafeToDelete( + ManifestEntry<?> entry, + Map<Long, Long> validSnapshotIdToSequenceNumberMap, + Map<Long, Long> expiredSnapshotIdToSequenceNumberMap) { + if (validSnapshotIdToSequenceNumberMap.containsKey(entry.snapshotId()) + || !expiredSnapshotIdToSequenceNumberMap.containsKey(entry.snapshotId())) { + return false; + } + + // The file in DELETE entry can be deleted if there are no Snapshots older than + // this one + if (validSnapshotIdToSequenceNumberMap.keySet().stream() + .noneMatch(snapshotId -> snapshotId < entry.snapshotId())) { + return true; + } + + // The file in DELETE entry can be deleted if no snapshot exists between + // the one that added it and this one + long fileSequenceNumber = entry.fileSequenceNumber() == null ? 0L : entry.fileSequenceNumber(); + long fileDeletedSnapshotSequenceNumber = + expiredSnapshotIdToSequenceNumberMap.get(entry.snapshotId()); + return fileSequenceNumber > 0 Review Comment: > Since we're using sequence numbers this means that we're also saying for v1 tables we can't determine to clean it up in the incremental strategy? Yes, if we have to rely on sequence numbers to determine the files to be cleaned, then we can't do it for v1 tables. It's not perfect but it is correct. And very agree with you that, it worth thinking a way to cover v1 tables. > I agree with this statement but doesn't this just mean we need to check the parent snapshots manifest entries for this particular file to see if it was added specifically in the parent snapshot? The same problems as described above, the inheritance chain between snapshots may have been broken before. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org