hantangwangd commented on code in PR #10983:
URL: https://github.com/apache/iceberg/pull/10983#discussion_r1733280303


##########
core/src/main/java/org/apache/iceberg/IncrementalFileCleanup.java:
##########
@@ -327,4 +342,34 @@ private Set<String> findFilesToDelete(
 
     return filesToDelete;
   }
+
+  private boolean isSafeToDelete(
+      ManifestEntry<?> entry,
+      Map<Long, Long> validSnapshotIdToSequenceNumberMap,
+      Map<Long, Long> expiredSnapshotIdToSequenceNumberMap) {
+    if (validSnapshotIdToSequenceNumberMap.containsKey(entry.snapshotId())
+        || 
!expiredSnapshotIdToSequenceNumberMap.containsKey(entry.snapshotId())) {
+      return false;
+    }
+
+    // The file in DELETE entry can be deleted if there are no Snapshots older 
than
+    // this one
+    if (validSnapshotIdToSequenceNumberMap.keySet().stream()
+        .noneMatch(snapshotId -> snapshotId < entry.snapshotId())) {
+      return true;
+    }
+
+    // The file in DELETE entry can be deleted if no snapshot exists between
+    // the one that added it and this one
+    long fileSequenceNumber = entry.fileSequenceNumber() == null ? 0L : 
entry.fileSequenceNumber();
+    long fileDeletedSnapshotSequenceNumber =
+        expiredSnapshotIdToSequenceNumberMap.get(entry.snapshotId());
+    return fileSequenceNumber > 0

Review Comment:
   > Since we're using sequence numbers this means that we're also saying for 
v1 tables we can't determine to clean it up in the incremental strategy?
   
   Yes, if we have to rely on sequence numbers to determine the files to be 
cleaned, then we can't do it for v1 tables. It's not perfect but it is correct. 
And very agree with you that, it worth thinking a way to cover v1 tables.
   
   > I agree with this statement but doesn't this just mean we need to check 
the parent snapshots manifest entries for this particular file to see if it was 
added specifically in the parent snapshot? 
   
   The same problems as described above, the inheritance chain between 
snapshots may have been broken before.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to