kevinjqliu opened a new issue, #1279:
URL: https://github.com/apache/iceberg-python/issues/1279

   ### Apache Iceberg version
   
   None
   
   ### Please describe the bug 🐞
   
   ### Problem
   I want to read files from multiple s3 regions. For example, my metadata 
files are in `us-west-2` but my data files are in `us-east-1`. This is not 
possible currently.
   
   ### Context
   Reading a file in `pyarrow` requires a `location` and a file system 
implementation, `fs`. For example, `location="s3://blah/foo.parquet"` and 
`fs=S3FileSystem`.
   
https://github.com/apache/iceberg-python/blob/0cebec48833f75eeca02b1a965112615b1cbc1c8/pyiceberg/io/pyarrow.py#L404-L419
   
   The `fs` is used to access the files in s3. And is initialized with the 
given `S3_REGION` according to the [S3 
configuration](https://py.iceberg.apache.org/configuration/#s3).
   
https://github.com/apache/iceberg-python/blob/0cebec48833f75eeca02b1a965112615b1cbc1c8/pyiceberg/io/pyarrow.py#L347-L365
   
   This means only 1 S3 region is allowed. 
   
   ### Possible Solution
   Create multiple instances of `S3FileSystem`, one for each region. And fetch 
the corresponding instance based on `location`. 
[`pyarrow.fs.resolve_s3_region(bucket)`](https://arrow.apache.org/docs/python/generated/pyarrow.fs.resolve_s3_region.html#pyarrow.fs.resolve_s3_region)
 can determine the correct region
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to