amogh-jahagirdar commented on code in PR #13614: URL: https://github.com/apache/iceberg/pull/13614#discussion_r2219196348
########## core/src/main/java/org/apache/iceberg/IncrementalFileCleanup.java: ########## @@ -50,7 +50,7 @@ class IncrementalFileCleanup extends FileCleanupStrategy { @Override @SuppressWarnings({"checkstyle:CyclomaticComplexity", "MethodLength"}) public void cleanFiles(TableMetadata beforeExpiration, TableMetadata afterExpiration) { Review Comment: I think there's a bigger question of moving away from incremental cleanup and just always doing the reachability analysis in the core cleanup. This is what's done in spark and other integrations (which is probably why this issue didn't surfaced for quite some time). If I recall correctly, the original intent for keeping incremental cleanup was that at the time branching/tagging was a pretty new introduction and I didn't want to regress folks expiration cleanup in the "average" case. However, even beyond branching/tagging there are other constructs on table metadata (like the ability to cherry pick) that can make incremental cleanup difficult to do safely, that maybe in 2.0 we should consider just removing it and always doing a reachability analysis? cc @rdblue @RussellSpitzer @stevenzwu @aokolnychyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org