amogh-jahagirdar commented on code in PR #13614:
URL: https://github.com/apache/iceberg/pull/13614#discussion_r2223247804


##########
core/src/main/java/org/apache/iceberg/IncrementalFileCleanup.java:
##########
@@ -81,12 +75,11 @@ public void cleanFiles(TableMetadata beforeExpiration, 
TableMetadata afterExpira
       return;
     }
 
-    SnapshotRef branchToCleanup = 
Iterables.getOnlyElement(beforeExpiration.refs().values(), null);
-    if (branchToCleanup == null) {
+    Snapshot latest = beforeExpiration.currentSnapshot();
+    if (latest == null) {

Review Comment:
   I think I had originally implemented this so that incremental cleanup can be 
performed on singular non-main branches (i.e. main is empty and there's some 
`test` branch which got cleaned up) but I do think that's a pretty narrow case 
to optimize for. 
   
   I think going back to just always taking the latest on main on the metadata 
before expiration and deriving ancestry from that for incremental cleanup is 
safest and simplest.



##########
core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java:
##########
@@ -236,7 +236,7 @@ public void testExpireOlderThanWithRollback() {
 
     Set<String> deletedFiles = Sets.newHashSet();
 
-    
removeSnapshots(table).expireOlderThan(tAfterCommits).deleteWith(deletedFiles::add).commit();
+    
table.expireSnapshots().expireOlderThan(tAfterCommits).deleteWith(deletedFiles::add).commit();

Review Comment:
   @rdblue @sqd We actually already have some test cases around the whole 
rollback/staged case, and with the new check for main ancestry we'd end up 
failing since `removeSnapshots(table)` parameterizes around both incremental 
and not incremental.
   
   I generally agree around biasing to the reachable analysis even if it 
technically could be incremental since reachability is always a clearly correct 
way of cleaning files but just want to call out that users can end up failing 
if they end up explicitly specifying incremental and also wanted to call out 
why I modified these existing tests.
   
   I also don't think this is a huge deal since the whole `withIncremental` API 
is not on the interface but requires a user to cast to `RemoveSnapshots` 
implementation (we added this API initially for testing against both cases and 
making sure the new reachable analysis was also correct)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to