thomas-pfeiffer opened a new issue, #2841:
URL: https://github.com/apache/iceberg-python/issues/2841

   ### Feature Request / Improvement
   
   **Feature: Missing AWS Profile Support in PyIceberg / `PyIceberg` should 
support AWS profiles**
   
   **Description:**
   When working with multiple AWS configs / credentials in parallel, AWS 
profiles are a convenient way to achieve this. Ideally, `PyIceberg` should 
therefore also support AWS profiles, which it currently does not.
   
   **Current state (as of writing - pyIceberg v0.10.0):**
   - The Glue part of the GlueCatalog can be configured to use the profile by 
specifing the Glue client explicitly in the Glue Catalog:
   ```py
   from boto3 import Session
   ...
   catalog = 
GlueCatalog(name="your_glue_catalog",client=Session(profile_name="your_aws_profile").client("glue"),...)
   ```
   - For `fsspec` backends, AWS profile support is generally available (see 
https://s3fs.readthedocs.io/en/latest/api.html#s3fs.core.S3FileSystem and 
https://github.com/fsspec/s3fs/blob/aceea3a4985f667979e4d8a5a5b8eeddaf23b7be/s3fs/core.py#L229),
 but it's not implemented in `PyIceberg` (see 
https://github.com/apache/iceberg-python/blob/e07296ea33a96efd3a100da7bf34c088d3ce8001/pyiceberg/io/fsspec.py#L199).
 To change that we would need to set the `session`  parameter of the 
`S3FileSystem` explicitly:
   ```py
   from s3fs import S3FileSystem
   from aiobotocore.session import AioSession
   ...
   fs = S3FileSystem(session=AioSession(profile="your_aws_profile"),...)
   ```
   -  For `PyArrow` backend, the AWS profile support is not yet available, but 
they do have an enhancement ticket for it (see 
https://github.com/apache/arrow/issues/47880). Once AWS profile is supported in 
`PyArrow` it can be implemented in `PyIceberg` as well, I assume. 
   
   
   **Workaround for this feature gap:**
   ```py
   session = Session(profile_name="your_aws_profile")
   credentials = session.get_credentials()  
   if credentials is None:
       raise ValueError("Could not retrieve credentials for profile")
   catalog = GlueCatalog(
       name="your_glue_catalog",
       **{ 
           "client.access-key-id": credentials.access_key,
           "client.secret-access-key": credentials.secret_key,
           "client.session-token": credentials.token,
           ...
       },
   )
   ```
   
   **To-Be / Expected Behavior:**
   1. `PyIceberg` should have a new `client.profile-name` and `s3.profile-name` 
configuration parameter (next to existing `glue.profile-name`.
   2. New `client.profile-name` should also set `glue.profile-name` (same 
behaviour as for all the other unified AWS credentials).
   3. For now, AWS profile support should be implemented for `fsspec` backend 
and `client.profile-name` and `s3.profile-name` should only be supported when 
using `fsspec` backend (`"py-io-impl": "pyiceberg.io.fsspec.FsspecFileIO"`).
   4. Once `PyArrow` supports AWS profile names (see 
https://github.com/apache/arrow/issues/47880), AWS profile support should be 
implemented for `PyArrow` backend as well and `client.profile-name` and 
`s3.profile-name` should be fully supported.
   
   Remark: I found this feature gap with the GlueCatalog; it might be that the 
RestCatalog is equally affected, but not sure.
   Issues possibly related to this issue: #570, #1207, #2657 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to