omkenge commented on issue #1200:
URL: 
https://github.com/apache/iceberg-python/issues/1200#issuecomment-2491973911

   `Orphan File Deletion in Iceberg Tables`
   Here's a step-by-step breakdown of the logic behind the process:
   1. List All Files in Storage
   2. Extract Referenced Files from Table Metadata
   3. Identify Orphan Files
   By comparing the list of all files in storage with the list of files 
referenced by the Iceberg table, the script identifies orphan files.
   These are files that exist in storage but are not part of the current table 
metadata.
   The comparison is performed by subtracting the set of referenced files from 
the set of all files in storage.
   4. Delete Orphan Files
   
   What is your opinion on this ?
   @kevinjqliu @Fokko @sungwy 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to