Re: [I] Support S3 Access Points with Access Point to Bucket mapping [iceberg-python]

via GitHub Sun, 03 Mar 2024 22:22:51 -0800


JGynther commented on issue #452:
URL: https://github.com/apache/iceberg-python/issues/452#issuecomment-1975812563

Finally had a chance to poke this.

To me it seems that there is no easy way out to implement this. When
creating and scanning a StaticTable the actual location of a particular file is
read based on metadata at least few times: initial reading of metadata and
again when manifest lists are turned into ManifestEntry objects for the data
scan. It is not enough just to replace the locations while/after reading the
initial metadata.

Maybe a reasonable place to implement would be in the [actual file
io](https://github.com/apache/iceberg-python/blob/3c225a75d3c8c1c3e5598dc1e02c6f8669e4e8d0/pyiceberg/io/pyarrow.py#L342)
with similar parameters that are already accepted for other things. This would
not work out of the box either as the PyArrow S3FileSystem does not support
replacing the bucket name.

It could work by creating a light wrapper around the S3FileSystem to replace
bucket name for files coming in based on a mapping like:
`("examplebucketname1", "replacedname-s3alias")`. Of course then the question
is should this instead be a request on the PyArrow side.

Another option would be decoupling filename/key from the location by
respecting e.g. the metadata location parameter, but this would require
changing a lot and probably not a good approach.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Support S3 Access Points with Access Point to Bucket mapping [iceberg-python]

Reply via email to