JGynther commented on issue #452: URL: https://github.com/apache/iceberg-python/issues/452#issuecomment-1975812563
Finally had a chance to poke this. To me it seems that there is no easy way out to implement this. When creating and scanning a StaticTable the actual location of a particular file is read based on metadata at least few times: initial reading of metadata and again when manifest lists are turned into ManifestEntry objects for the data scan. It is not enough just to replace the locations while/after reading the initial metadata. Maybe a reasonable place to implement would be in the [actual file io](https://github.com/apache/iceberg-python/blob/3c225a75d3c8c1c3e5598dc1e02c6f8669e4e8d0/pyiceberg/io/pyarrow.py#L342) with similar parameters that are already accepted for other things. This would not work out of the box either as the PyArrow S3FileSystem does not support replacing the bucket name. It could work by creating a light wrapper around the S3FileSystem to replace bucket name for files coming in based on a mapping like: `("examplebucketname1", "replacedname-s3alias")`. Of course then the question is should this instead be a request on the PyArrow side. Another option would be decoupling filename/key from the location by respecting e.g. the metadata location parameter, but this would require changing a lot and probably not a good approach. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org