amogh-jahagirdar commented on code in PR #13614: URL: https://github.com/apache/iceberg/pull/13614#discussion_r2223247804
########## core/src/main/java/org/apache/iceberg/IncrementalFileCleanup.java: ########## @@ -81,12 +75,11 @@ public void cleanFiles(TableMetadata beforeExpiration, TableMetadata afterExpira return; } - SnapshotRef branchToCleanup = Iterables.getOnlyElement(beforeExpiration.refs().values(), null); - if (branchToCleanup == null) { + Snapshot latest = beforeExpiration.currentSnapshot(); + if (latest == null) { Review Comment: I think I had originally implemented this so that incremental cleanup can be performed on singular non-main branches (i.e. main is empty and there's some `test` branch which got cleaned up) but I do think that's a pretty narrow case to optimize for. I think going back to just always taking the latest on main on the metadata before expiration and deriving ancestry from that for incremental cleanup is safest and simplest. ########## core/src/test/java/org/apache/iceberg/TestRemoveSnapshots.java: ########## @@ -236,7 +236,7 @@ public void testExpireOlderThanWithRollback() { Set<String> deletedFiles = Sets.newHashSet(); - removeSnapshots(table).expireOlderThan(tAfterCommits).deleteWith(deletedFiles::add).commit(); + table.expireSnapshots().expireOlderThan(tAfterCommits).deleteWith(deletedFiles::add).commit(); Review Comment: @rdblue @sqd We actually already have some test cases around the whole rollback/staged case, and with the new check for main ancestry we'd end up failing since `removeSnapshots(table)` parameterizes around both incremental and not incremental. I generally agree around biasing to the reachable analysis even if it technically could be incremental since reachability is always a clearly correct way of cleaning files but just want to call out that users can end up failing if they end up explicitly specifying incremental and also wanted to call out why I modified these existing tests. I also don't think this is a huge deal since the whole `withIncremental` API is not on the interface but requires a user to cast to `RemoveSnapshots` implementation (we added this API initially for testing against both cases and making sure the new reachable analysis was also correct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org