danhphan commented on issue #1279: URL: https://github.com/apache/iceberg-python/issues/1279#issuecomment-2466676408
Thanks @kevinjqliu , I'm reading the code base. Can you please give me an example of expected unit-tests for the feature if possible? For instance, if we create the follow `s3_fileio` with "s3.region": "us-east-1" in the `session_properties`. Then we create an `input_file` on s3 bucket of `warehouse`, which is actually located in "eu-central-1" region, what should be the expected results? ``` session_properties: Properties = { "s3.endpoint": "http://localhost:9000", "s3.access-key-id": "admin", "s3.secret-access-key": "password", "s3.region": "us-east-1", "s3.session-token": "s3.session-token", **UNIFIED_AWS_SESSION_PROPERTIES, } s3_fileio = PyArrowFileIO(properties=session_properties) print(s3_fileio.properties['s3.region']) #--> us-east-1 filename = str(uuid.uuid4()) input_file = s3_fileio.new_input(location=f"s3://warehouse/{filename}") print(pyarrow.fs.resolve_s3_region('warehouse')) #--> eu-central-1 output_file = s3_fileio.new_output(location=f"s3://foo/{filename}") print(pyarrow.fs.resolve_s3_region('foo')) #--> us-east-1 ``` I'm thinking may be in the `def _initialize_fs(self, scheme: str, netloc: Optional[str] = None) -> FileSystem` in your above comments, we can assign the value for "region" in `client_kwargs` based on the value of `netloc` (or s3 bucket), but not sure if it is the right direction. Like: `"region": pyarrow.fs.resolve_s3_region(netloc), ` ``` def _initialize_fs(self, scheme: str, netloc: Optional[str] = None) -> FileSystem: if scheme in {"s3", "s3a", "s3n"}: from pyarrow.fs import S3FileSystem client_kwargs: Dict[str, Any] = { "endpoint_override": self.properties.get(S3_ENDPOINT), "access_key": get_first_property_value(self.properties, S3_ACCESS_KEY_ID, AWS_ACCESS_KEY_ID), "secret_key": get_first_property_value(self.properties, S3_SECRET_ACCESS_KEY, AWS_SECRET_ACCESS_KEY), "session_token": get_first_property_value(self.properties, S3_SESSION_TOKEN, AWS_SESSION_TOKEN), "region": get_first_property_value(self.properties, S3_REGION, AWS_REGION), } ``` Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org