Fokko commented on issue #1200: URL: https://github.com/apache/iceberg-python/issues/1200#issuecomment-2630467871
I think we want to avoid depending directly on OpenDal, since that's another dependency. FileIO officially doesn't support listing of directories because listing of a directory doesn't perform well on object stores. This will result in a paged response that potentially has a lot of pages. A catalog might provide a more powerful way of cleaning up orphan files by leveraging [S3 Inventory lists](https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory.html), but I don't think that's a good implementation for the client itself. Similar to the Java implementation where we rely on the underlying filesystem, I think we can do something similar in PyIceberg by using the [Arrow FileSystem to list the files](https://arrow.apache.org/docs/python/filesystems.html#listing-files). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org