hantangwangd commented on code in PR #10983:
URL: https://github.com/apache/iceberg/pull/10983#discussion_r1733265734


##########
core/src/main/java/org/apache/iceberg/IncrementalFileCleanup.java:
##########
@@ -327,4 +342,34 @@ private Set<String> findFilesToDelete(
 
     return filesToDelete;
   }
+
+  private boolean isSafeToDelete(
+      ManifestEntry<?> entry,
+      Map<Long, Long> validSnapshotIdToSequenceNumberMap,
+      Map<Long, Long> expiredSnapshotIdToSequenceNumberMap) {
+    if (validSnapshotIdToSequenceNumberMap.containsKey(entry.snapshotId())
+        || 
!expiredSnapshotIdToSequenceNumberMap.containsKey(entry.snapshotId())) {
+      return false;
+    }
+
+    // The file in DELETE entry can be deleted if there are no Snapshots older 
than
+    // this one
+    if (validSnapshotIdToSequenceNumberMap.keySet().stream()
+        .noneMatch(snapshotId -> snapshotId < entry.snapshotId())) {
+      return true;

Review Comment:
   > Snapshot IDs are guaranteed to be unique but not necessarily monotonically 
increasing (in fact by default in the reference implementation it's random.
   
   Thanks for your reminder, sorry for misleading by 
`TestTableOperations.newSnapshotId()`. Comparing the `snapshotId` is definitely 
wrong and I will fix it.
   
   > I think you'll just need to check that there's no parent snapshot for each 
of the snapshots in the valid snapshots map.
   
   Checking the parent snapshot may also encounter some problems, for example, 
we may have already expired some specified snapshots including the parent of 
this snapshot before. In this case, the inheritance chain between snapshots has 
been broken. So it seems that we shouldn't rely on it as it may lead in wrong 
result.
   
   For example, if we have a table with actions as follows:
   
   ```
   table.newAppend().appendFile(FILE_A).commit; // snapshotID1
   table.newAppend().appendFile(FILE_B).commit; // snapshotID2
   table.newDelete().deleteFile(FILE_A).commit; // snapshotID3
   table.newAppend().appendFile(FILE_A2).commit;        // snapshotID4
   ```
   
   And we have already expired specified snapshot `snapshotID2` before.
   
   Now when we trying to expire snapshot `snapshotID3` using the policy of 
`IncrementalFileCleanup`, `FILE_A` would be wrongly cleaned up if we rely on 
checking the parent snapshot.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to