omkenge commented on issue #1200: URL: https://github.com/apache/iceberg-python/issues/1200#issuecomment-2491973911
`Orphan File Deletion in Iceberg Tables` Here's a step-by-step breakdown of the logic behind the process: 1. List All Files in Storage 2. Extract Referenced Files from Table Metadata 3. Identify Orphan Files By comparing the list of all files in storage with the list of files referenced by the Iceberg table, the script identifies orphan files. These are files that exist in storage but are not part of the current table metadata. The comparison is performed by subtracting the set of referenced files from the set of all files in storage. 4. Delete Orphan Files What is your opinion on this ? @kevinjqliu @Fokko @sungwy -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org