geruh opened a new issue, #570: URL: https://github.com/apache/iceberg-python/issues/570
### Apache Iceberg version main (development) ### Please describe the bug đ When initializing the GlueCatalog with a specific AWS profile, everything works as it should with catalog operations. But, weâve hit a issue when it comes to working with S3 via the PyArrow S3FileSystem. Users can specify a profile for initiating a boto connection however, this preference doesnât carry over to the S3FileSystem. Instead of using the specified AWS profile, it will check the catalog configs for the s3 configs like:`s3.access-key-id, s3.region... `. If those aren't passed in PyArrow's S3Filesystem has it's own strategy of inferring credentials such as: 1. the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN environment variables. 2. the default profile credentials in your ~/.aws/credentials and ~/.aws/config. This workflow leads to some inconsistencies. For example, while Glue operations might be using a ux specified profile, S3 operations could end up using a different set of credentials or even a different region from whatâs set in the environment variables or the AWS config files. This is seen in issue #515, where one region (like us-west-2) unexpectedly switches to another (like us-east-1), causing a 301 exception. For example: 1. Set up an AWS profile in ~/.aws/config with an incorrect region: ``` [default] region = us-east-1 [test] region = us-west-2 ``` 2. Initialize the GlueCatalog with the correct region you want to use: ``` catalog = pyiceberg.catalog.load_catalog( catalog_name, **{"type": "glue", "profile_name": "test", "region_name": "us-west-2"} ) ``` 3. load a table ``` catalog.load_table("default.test") File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status OSError: When reading information for key 'test/metadata/00000-c0fc4e45-d79d-41a1-ba92-a4122c09171c.metadata.json' in bucket 'test_bucket': AWS Error UNKNOWN (HTTP status 301) during HeadObject operation: No response body. ``` On one hand, we could argue that this profile configuration should only work at the catalog level, and for filesystems, the user must specify the aforementioned configs like `s3.region`. But on the other hand it seems reasonable that the AWS profile config should work uniformly across both the catalog and filesystem levels. This unified approach would certainly simplify configuration management for users. Iâm leaning towards this perspective. However, we're currently utilizing PyArrow's S3FileSystem, which doesn't inherently support AWS profiles. This means we'd need to bridge that gap manually. cc: @HonahX @Fokko @kevinjqliu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org