amogh-jahagirdar opened a new pull request, #13614:
URL: https://github.com/apache/iceberg/pull/13614

   Fixes #13568 
   
   Currently in the core implementation of file cleanup as part of expires 
snapshots, it's possible that _all_ non-main references are removed after a 
snapshot expiration; the decision to select incremental cleanup or the 
reachable cleanup is simply based on the table metadata after expiration having 
a single reference. However, this is not correct since full reachability is 
required if the table metadata before expiration had other refs. 
   
   Additionally, the incremental cleanup would currently just get the first ref 
from the metadata before expiration assuming there was only one, and this can  
just be some non-main branch or tag, and then the ancestry calculation for 
incremental cleanup is incorrect.
   
   The following changes are made in this PR:
   
   1. Incremental cleanup is performed only if table metadata before and after 
("after" being the table metadata after the commit, it can include any 
intermediate changes after the expiration was committed) has only one ref (the 
main ref)
   2. Adding additional validation in Incremental cleanup and changing 
Iterables.getFirst in incremental cleanup to `getOnlyElement` so there's an 
explicit failure in case that violation somehow gets broken
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to