rdblue commented on code in PR #13614: URL: https://github.com/apache/iceberg/pull/13614#discussion_r2220580612
########## core/src/main/java/org/apache/iceberg/IncrementalFileCleanup.java: ########## @@ -50,7 +50,7 @@ class IncrementalFileCleanup extends FileCleanupStrategy { @Override @SuppressWarnings({"checkstyle:CyclomaticComplexity", "MethodLength"}) public void cleanFiles(TableMetadata beforeExpiration, TableMetadata afterExpiration) { - if (afterExpiration.refs().size() > 1) { + if (beforeExpiration.refs().size() > 1 || afterExpiration.refs().size() > 1) { Review Comment: Could this leak files if there is a rollback or a staged (but not referenced) commit? For instance, if you have snapshots like this: ``` A --- B --- C --- D (main) `--- X (unreferenced) ``` If snapshot X should be cleaned up, Iceberg should not use the incremental strategy. Also, if X is not removed but B and C are, then incremental also can't be used because C may have removed files from X. Instead of checking the number of refs, I think we need to check the history of the main branch. If there is any snapshot outside of the main branch before or after expiration, incremental cannot be used. There are cases where a ref is in the main branch's history that are safe, but I would probably keep the implementation simpler and avoid checking for things like branch overlap. What I would do is: 1. If there is more than one ref after expiration, use the reference set strategy (optionally, we could check that all refs are in the main branch's history and allow it) 2. If there are any snapshots to be removed after expiration that were not the main branch's ancestors, use the reference set strategy -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org