amogh-jahagirdar commented on code in PR #10983: URL: https://github.com/apache/iceberg/pull/10983#discussion_r1733319305
########## core/src/main/java/org/apache/iceberg/IncrementalFileCleanup.java: ########## @@ -327,4 +342,34 @@ private Set<String> findFilesToDelete( return filesToDelete; } + + private boolean isSafeToDelete( + ManifestEntry<?> entry, + Map<Long, Long> validSnapshotIdToSequenceNumberMap, + Map<Long, Long> expiredSnapshotIdToSequenceNumberMap) { + if (validSnapshotIdToSequenceNumberMap.containsKey(entry.snapshotId()) + || !expiredSnapshotIdToSequenceNumberMap.containsKey(entry.snapshotId())) { + return false; + } + + // The file in DELETE entry can be deleted if there are no Snapshots older than + // this one + if (validSnapshotIdToSequenceNumberMap.keySet().stream() + .noneMatch(snapshotId -> snapshotId < entry.snapshotId())) { + return true; Review Comment: Ah good point, yes the inheritance chain isn't guaranteed since the parent also could've been expired. OK I'll need to think about this a bit more deeply then. On the surface just seems like if someone is expiring a specific snapshot where there's a deleted entry we may want to do the reachability analysis; maybe with the file sequence number we can do a more intelligent incremental cleanup for V2 tables. Or potentially since we have the invariant here that there's only one ref at this point, we can just validate that there are no ancestors at all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org