jiakai-li commented on code in PR #1453: URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1899180251
########## pyiceberg/io/pyarrow.py: ########## @@ -362,6 +362,12 @@ def _initialize_fs(self, scheme: str, netloc: Optional[str] = None) -> FileSyste "region": get_first_property_value(self.properties, S3_REGION, AWS_REGION), } + # Override the default s3.region if netloc(bucket) resolves to a different region + try: + client_kwargs["region"] = resolve_s3_region(netloc) Review Comment: Thank you Fokko, my understanding is that the problem occurs when the provided `region` doesn't match the data file bucket region, and that will fail the file read for pyarrow. And by overwriting the bucket region (fall back to provided region), we make sure the real bucket region that a data file is stored takes precedence. (this function is cached when using `fs_by_scheme`, so it will be called only for new bucket that's not resolved previously to save calls to S3) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org