gaborkaszab commented on issue #12819: URL: https://github.com/apache/iceberg/issues/12819#issuecomment-2841726449
FYI, since I was also involved in the investigation for this issue, I was able to answer Peter's question in the meantime. Now that I gave some extra thoughts, I'm a bit hesitant to expose the underlying file cleanup strategies to the users. Ideally these are kind of internal to snapshot expiry functionality to choose the best suitable one. But if we consider this internal, I think it is problematic that based on which strategy is chosen we might get different results. I just have done [some experimentations](https://github.com/gaborkaszab/iceberg/commit/0962db629ddcc622cb11e5716acd004964282c75) with the incremental strategy to also remove the data files being marked as deleted not just in the expired snapshots but also the direct subsequent one. Maybe there are some rough edges that I have to figure out, but seems a possible approach to bring the 2 strategies in sync wrt this problematic use case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org