kevinjqliu commented on PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557749844

   > Is the change I made in accordance with this option? What I've done 
essentially is using the netloc to determine the bucket region. Only in case 
when, for some reason, the region cannot be determined then we fall back to the 
properties configuration.
   
   Im dont think `netloc` can be used to determine the region. S3 URI scheme 
doesn't use `netloc`, only S3 URL does. 
   For example, heres how `fs_by_scheme` is typically used 
   
https://github.com/apache/iceberg-python/blob/dbcf65b4892779efca7362e069edecff7f2bf69f/pyiceberg/io/pyarrow.py#L434-L436
   
   and running an example S3 URI:
   ```
   from pyiceberg.io.pyarrow import PyArrowFileIO
   scheme, netloc, path = PyArrowFileIO.parse_location("s3://a/b/c/1.txt")
   # returns ('s3', 'a', 'a/b/c/1.txt')
   ```
   
   In order to support multiple regions, we might need to call 
`resolve_s3_region` first and pass the `region` to `fs_by_scheme`. If you look 
at it from `S3FileSystem`'s perspective, we need a new `S3FileSystem` object 
per region. This relates to how the `FileSystem` is cached. 
   
   
   BTW a good test scenario can be a table where my metadata files are stored 
in one bucket while my data files are stored in another. We might be able to 
construct this test case by modifying the `minio` settings to create different 
regional buckets; I haven't tested this yet. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to