JGynther commented on issue #452:
URL: https://github.com/apache/iceberg-python/issues/452#issuecomment-1975812563

   Finally had a chance to poke this.
   
   To me it seems that there is no easy way out to implement this. When 
creating and scanning a StaticTable the actual location of a particular file is 
read based on metadata at least few times: initial reading of metadata and 
again when manifest lists are turned into ManifestEntry objects for the data 
scan. It is not enough just to replace the locations while/after reading the 
initial metadata.
   
   Maybe a reasonable place to implement would be in the [actual file 
io](https://github.com/apache/iceberg-python/blob/3c225a75d3c8c1c3e5598dc1e02c6f8669e4e8d0/pyiceberg/io/pyarrow.py#L342)
 with similar parameters that are already accepted for other things. This would 
not work out of the box either as the PyArrow S3FileSystem does not support 
replacing the bucket name.
   
   It could work by creating a light wrapper around the S3FileSystem to replace 
bucket name for files coming in based on a mapping like: 
`("examplebucketname1", "replacedname-s3alias")`. Of course then the question 
is should this instead be a request on the PyArrow side.
   
   Another option would be decoupling filename/key from the location by 
respecting e.g. the metadata location parameter, but this would require 
changing a lot and probably not a good approach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to