Re: [I] Support S3 Access Points with Access Point to Bucket mapping [iceberg-python]

via GitHub Wed, 06 Mar 2024 04:44:29 -0800


JGynther commented on issue #452:
URL: https://github.com/apache/iceberg-python/issues/452#issuecomment-1980790119


   Testing a very simple wrapper like:
   
   ```Python
   from pyarrow.fs import S3FileSystem
   
   class WrappedS3FileSystem(S3FileSystem):
        def __init__(self, bucket_override, **kwargs):
                super().__init__(**kwargs)
                self.override = bucket_override
        
        def open_input_file(self, path):
                for bucket in self.override:
                        path = path.replace(bucket[0], bucket[1], 1)
                
                return super().open_input_file(path)
   ```
   Configured like so:
   ```Python
   table = StaticTable.from_metadata(
       "s3://accesspoint-number-s3alias/path/to/table",
       {
           "s3.bucket_override": [
               (
                   "actualbucketnamehere",
                   "accesspoint-number-s3alias",
               )
           ],
       },
   )
   ```
   Allows `StaticTable.scan` to properly create the `DataScan` object. Trying 
to query the data based on any of the methods that use `to_arrow` would still 
fail as that uses the PyArrow Dataset Scanner instead of S3FileSystem. One 
could however manually handle this from the `DataScan.plan_files`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Support S3 Access Points with Access Point to Bucket mapping [iceberg-python]

Reply via email to