gaborkaszab commented on issue #12819:
URL: https://github.com/apache/iceberg/issues/12819#issuecomment-2841726449

   FYI, since I was also involved in the investigation for this issue, I was 
able to answer Peter's question in the meantime.
   Now that I gave some extra thoughts, I'm a bit hesitant to expose the 
underlying file cleanup strategies to the users. Ideally these are kind of 
internal to snapshot expiry functionality to choose the best suitable one. But 
if we consider this internal, I think it is problematic that based on which 
strategy is chosen we might get different results.
   
   I just have done [some 
experimentations](https://github.com/gaborkaszab/iceberg/commit/0962db629ddcc622cb11e5716acd004964282c75)
 with the incremental strategy to also remove the data files being marked as 
deleted not just in the expired snapshots but also the direct subsequent one. 
Maybe there are some rough edges that I have to figure out, but seems a 
possible approach to bring the 2 strategies in sync wrt this problematic use 
case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to