thomas-pfeiffer opened a new issue, #2841:
URL: https://github.com/apache/iceberg-python/issues/2841
### Feature Request / Improvement
**Feature: Missing AWS Profile Support in PyIceberg / `PyIceberg` should
support AWS profiles**
**Description:**
When working with multiple AWS configs / credentials in parallel, AWS
profiles are a convenient way to achieve this. Ideally, `PyIceberg` should
therefore also support AWS profiles, which it currently does not.
**Current state (as of writing - pyIceberg v0.10.0):**
- The Glue part of the GlueCatalog can be configured to use the profile by
specifing the Glue client explicitly in the Glue Catalog:
```py
from boto3 import Session
...
catalog =
GlueCatalog(name="your_glue_catalog",client=Session(profile_name="your_aws_profile").client("glue"),...)
```
- For `fsspec` backends, AWS profile support is generally available (see
https://s3fs.readthedocs.io/en/latest/api.html#s3fs.core.S3FileSystem and
https://github.com/fsspec/s3fs/blob/aceea3a4985f667979e4d8a5a5b8eeddaf23b7be/s3fs/core.py#L229),
but it's not implemented in `PyIceberg` (see
https://github.com/apache/iceberg-python/blob/e07296ea33a96efd3a100da7bf34c088d3ce8001/pyiceberg/io/fsspec.py#L199).
To change that we would need to set the `session` parameter of the
`S3FileSystem` explicitly:
```py
from s3fs import S3FileSystem
from aiobotocore.session import AioSession
...
fs = S3FileSystem(session=AioSession(profile="your_aws_profile"),...)
```
- For `PyArrow` backend, the AWS profile support is not yet available, but
they do have an enhancement ticket for it (see
https://github.com/apache/arrow/issues/47880). Once AWS profile is supported in
`PyArrow` it can be implemented in `PyIceberg` as well, I assume.
**Workaround for this feature gap:**
```py
session = Session(profile_name="your_aws_profile")
credentials = session.get_credentials()
if credentials is None:
raise ValueError("Could not retrieve credentials for profile")
catalog = GlueCatalog(
name="your_glue_catalog",
**{
"client.access-key-id": credentials.access_key,
"client.secret-access-key": credentials.secret_key,
"client.session-token": credentials.token,
...
},
)
```
**To-Be / Expected Behavior:**
1. `PyIceberg` should have a new `client.profile-name` and `s3.profile-name`
configuration parameter (next to existing `glue.profile-name`.
2. New `client.profile-name` should also set `glue.profile-name` (same
behaviour as for all the other unified AWS credentials).
3. For now, AWS profile support should be implemented for `fsspec` backend
and `client.profile-name` and `s3.profile-name` should only be supported when
using `fsspec` backend (`"py-io-impl": "pyiceberg.io.fsspec.FsspecFileIO"`).
4. Once `PyArrow` supports AWS profile names (see
https://github.com/apache/arrow/issues/47880), AWS profile support should be
implemented for `PyArrow` backend as well and `client.profile-name` and
`s3.profile-name` should be fully supported.
Remark: I found this feature gap with the GlueCatalog; it might be that the
RestCatalog is equally affected, but not sure.
Issues possibly related to this issue: #570, #1207, #2657
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]