Re: [I] Delete orphan files [iceberg-python]

via GitHub Sun, 24 Nov 2024 14:15:22 -0800


ndrluis commented on issue #1200:
URL: 
https://github.com/apache/iceberg-python/issues/1200#issuecomment-2496257742


   @omkenge I believe you will need to wait for the merge of #1285. In the 
meantime, I will work on the partition statistics over the next few weeks. 
Before that, I believe we will be tracking all the files in the metadata (this 
needs to be double-checked). With that, you will be able to verify what could 
be removed.
   
   Another point is the filesystem that will be responsible for scanning the 
directory. FileIO is not how we solve this, so we will need to use something 
else. Perhaps OpenDAL would be a good candidate. As a reference, you can see 
that the [Java implementation uses the Hadoop 
filesystem](https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java#L356).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Delete orphan files [iceberg-python]

Reply via email to