kevinjqliu commented on PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#issuecomment-2557381909
@jiakai-li Thanks for working on this! And happy holidays :) > I noticed the PyArrowFileIO._initialize_fs function doesn't take netloc parameter into account when initialize the S3FileSystem looking through the usage for `_initialize_fs`, it doesnt look like `netloc` is used at all. > it always uses the region found in properties I think that's one of the problems we need to tackle. The current S3 configuration requires a specific "region" to be set. This assumes that all data and metadata files are from the same region as the one specified. But what if i have some files in one region and some in another? I think a potential solution might be to omit the "region" property and allow the S3FileSystem to determine the proper region using `resolve_s3_region`. This is recommended in the [S3FileSystem docs](https://arrow.apache.org/docs/python/generated/pyarrow.fs.S3FileSystem.html) for `region`. Another potential issue is the way we cache fs, it assumes that there's only one fs per scheme. With the region approach above, we break this assumption. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org