kevinjqliu commented on PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557749844
> Is the change I made in accordance with this option? What I've done essentially is using the netloc to determine the bucket region. Only in case when, for some reason, the region cannot be determined then we fall back to the properties configuration. Im dont think `netloc` can be used to determine the region. S3 URI scheme doesn't use `netloc`, only S3 URL does. For example, heres how `fs_by_scheme` is typically used https://github.com/apache/iceberg-python/blob/dbcf65b4892779efca7362e069edecff7f2bf69f/pyiceberg/io/pyarrow.py#L434-L436 and running an example S3 URI: ``` from pyiceberg.io.pyarrow import PyArrowFileIO scheme, netloc, path = PyArrowFileIO.parse_location("s3://a/b/c/1.txt") # returns ('s3', 'a', 'a/b/c/1.txt') ``` In order to support multiple regions, we might need to call `resolve_s3_region` first and pass the `region` to `fs_by_scheme`. If you look at it from `S3FileSystem`'s perspective, we need a new `S3FileSystem` object per region. This relates to how the `FileSystem` is cached. BTW a good test scenario can be a table where my metadata files are stored in one bucket while my data files are stored in another. We might be able to construct this test case by modifying the `minio` settings to create different regional buckets; I haven't tested this yet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org