amogh-jahagirdar opened a new pull request, #13614: URL: https://github.com/apache/iceberg/pull/13614
Fixes #13568 Currently in the core implementation of file cleanup as part of expires snapshots, it's possible that _all_ non-main references are removed after a snapshot expiration; the decision to select incremental cleanup or the reachable cleanup is simply based on the table metadata after expiration having a single reference. However, this is not correct since full reachability is required if the table metadata before expiration had other refs. Additionally, the incremental cleanup would currently just get the first ref from the metadata before expiration assuming there was only one, and this can just be some non-main branch or tag, and then the ancestry calculation for incremental cleanup is incorrect. The following changes are made in this PR: 1. Incremental cleanup is performed only if table metadata before and after ("after" being the table metadata after the commit, it can include any intermediate changes after the expiration was committed) has only one ref (the main ref) 2. Adding additional validation in Incremental cleanup and changing Iterables.getFirst in incremental cleanup to `getOnlyElement` so there's an explicit failure in case that violation somehow gets broken -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org