Re: [I] Delete orphan files [iceberg-python]

2025-03-03 Thread via GitHub
kevinjqliu commented on issue #1200: URL: https://github.com/apache/iceberg-python/issues/1200#issuecomment-2695859107 > But I do not know how to create s3 file system or how to support other file storage in same logic .Could you plz help me on this take a look at `load_file_io` and

Re: [I] Delete orphan files [iceberg-python]

2025-02-21 Thread via GitHub
omkenge commented on issue #1200: URL: https://github.com/apache/iceberg-python/issues/1200#issuecomment-2674484409 @kevinjqliu I tried this works for me But I do not know how to create s3 file system or how to support other file storage in same logic .Could you plz help me on this

Re: [I] Delete orphan files [iceberg-python]

2025-02-06 Thread via GitHub
kevinjqliu commented on issue #1200: URL: https://github.com/apache/iceberg-python/issues/1200#issuecomment-2641289104 > Extract Metadata-Tracked Files we might want to use all_files and all_metadata_files. `files` only gets the data files for the current snapshot -- This is an au

Re: [I] Delete orphan files [iceberg-python]

2025-02-06 Thread via GitHub
omkenge commented on issue #1200: URL: https://github.com/apache/iceberg-python/issues/1200#issuecomment-2640451331 Hello @Fokko Here is the small Implementation 1. List Data Files in S3 We use PyArrow’s S3FileSystem to retrieve file paths from the given table location:

Re: [I] Delete orphan files [iceberg-python]

2025-02-03 Thread via GitHub
Fokko commented on issue #1200: URL: https://github.com/apache/iceberg-python/issues/1200#issuecomment-2630467871 I think we want to avoid depending directly on OpenDal, since that's another dependency. FileIO officially doesn't support listing of directories because listing of a directory

Re: [I] Delete orphan files [iceberg-python]

2025-01-31 Thread via GitHub
ndrluis commented on issue #1200: URL: https://github.com/apache/iceberg-python/issues/1200#issuecomment-2627809869 Hi @omkenge, I don’t have direct experience with OpenDAL, but my suggestion is based on how [iceberg-rust is currently using it](https://github.com/search?q=repo%3Aapache%2Fi

Re: [I] Delete orphan files [iceberg-python]

2025-01-28 Thread via GitHub
omkenge commented on issue #1200: URL: https://github.com/apache/iceberg-python/issues/1200#issuecomment-2619789414 Hello @ndrluis Could you plz help me on OpenDal how we can use and integrate this. It will very helpfull for me. and another thing I just extract the data file from sn

Re: [I] Delete orphan files [iceberg-python]

2025-01-20 Thread via GitHub
ndrluis commented on issue #1200: URL: https://github.com/apache/iceberg-python/issues/1200#issuecomment-2602218117 Hello @omkenge, you can start development, but please note that we need the partition statistics. I'll start working on this feature this week. The merge for the orphan files

Re: [I] Delete orphan files [iceberg-python]

2025-01-19 Thread via GitHub
omkenge commented on issue #1200: URL: https://github.com/apache/iceberg-python/issues/1200#issuecomment-2601440980 Hello @ndrluis I think #1285 is now merged can I start working on this issue. -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [I] Delete orphan files [iceberg-python]

2024-11-24 Thread via GitHub
ndrluis commented on issue #1200: URL: https://github.com/apache/iceberg-python/issues/1200#issuecomment-2496257742 @omkenge I believe you will need to wait for the merge of #1285. In the meantime, I will work on the partition statistics over the next few weeks. Before that, I believe we w

Re: [I] Delete orphan files [iceberg-python]

2024-11-23 Thread via GitHub
kevinjqliu commented on issue #1200: URL: https://github.com/apache/iceberg-python/issues/1200#issuecomment-2495641285 That looks generally correct to me. There are a few caveats though. This assumes that the entire iceberg table (metadata and data files) is in a single location and that n

Re: [I] Delete orphan files [iceberg-python]

2024-11-21 Thread via GitHub
omkenge commented on issue #1200: URL: https://github.com/apache/iceberg-python/issues/1200#issuecomment-2491973911 `Orphan File Deletion in Iceberg Tables` Here's a step-by-step breakdown of the logic behind the process: 1. List All Files in Storage 2. Extract Referenced Files from

Re: [I] Delete orphan files [iceberg-python]

2024-10-29 Thread via GitHub
sungwy commented on issue #1200: URL: https://github.com/apache/iceberg-python/issues/1200#issuecomment-2445095723 Hey sure thing! I'll assign it to you @omkenge -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [I] Delete orphan files [iceberg-python]

2024-10-29 Thread via GitHub
omkenge commented on issue #1200: URL: https://github.com/apache/iceberg-python/issues/1200#issuecomment-2444971567 Hi @sungwy I would like to work on this .. Can I ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and