Fokko commented on issue #1200:
URL: 
https://github.com/apache/iceberg-python/issues/1200#issuecomment-2630467871

   I think we want to avoid depending directly on OpenDal, since that's another 
dependency. FileIO officially doesn't support listing of directories because 
listing of a directory doesn't perform well on object stores. This will result 
in a paged response that potentially has a lot of pages.
   
   A catalog might provide a more powerful way of cleaning up orphan files by 
leveraging [S3 Inventory 
lists](https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory.html),
 but I don't think that's a good implementation for the client itself. Similar 
to the Java implementation where we rely on the underlying filesystem, I think 
we can do something similar in PyIceberg by using the [Arrow FileSystem to list 
the files](https://arrow.apache.org/docs/python/filesystems.html#listing-files).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to