impproductions commented on issue #515:
URL: https://github.com/apache/iceberg-python/issues/515#issuecomment-2208697240
We had the same problem within our Airflow deployment. The easy fix for us
would have been to set the default aws credentials through environment
variables:
```bash
AWS_ACCESS_KEY_ID=<aws region>
AWS_DEFAULT_REGION=<aws access key>
AWS_SECRET_ACCESS_KEY=<aws secret key>
```
This, however, wasn't feasible because of deployment issues.
Long story short, we ended up with this solution:
```python
glue_catalog_conf = {
"s3.region": <aws region>,
"s3.access-key-id": <aws access key>,
"s3.secret-access-key": <aws secret key>,
"region_name": <aws region>,
"aws_access_key_id": <aws access_key>,
"aws_secret_access_key": <aws secret key>,
}
catalog: GlueCatalog = load_catalog(
"some_name",
type="glue",
**glue_catalog_conf
)
```
If you come from a google search, please take everything that follows with a
grain of salt, because we have no previous experience with either pyiceberg or
airflow. Anyway.
We came to this conclusion (that we needed to pass *both* formats) because
it seems that the initializer for the boto client initializer are in one format
(the second set in the above snippet):
```python
class GlueCatalog(Catalog):
def __init__(self, name: str, **properties: Any):
super().__init__(name, **properties)
session = boto3.Session(
profile_name=properties.get("profile_name"),
region_name=properties.get("region_name"),
botocore_session=properties.get("botocore_session"),
aws_access_key_id=properties.get("aws_access_key_id"),
aws_secret_access_key=properties.get("aws_secret_access_key"),
aws_session_token=properties.get("aws_session_token"),
)
self.glue: GlueClient = session.client("glue")
```
And the same set of properties is passed to the `load_file_io` pyiceberg
function, which, to the extent of our very limited understanding, expects the
other format (`s3.stuff`):
```python
io = load_file_io(properties=self.properties, location=metadata_location)
file = io.new_input(metadata_location)
metadata = FromInputFile.table_metadata(file)
return Table(
identifier=(self.name, database_name, table_name),
metadata=metadata,
metadata_location=metadata_location,
io=self._load_file_io(metadata.properties, metadata_location),
catalog=self,
)
```
We might be completely off base here, of course, and what ultimately
convinced us to adopt the above solution is just that it works, while passing
either set of credentials without the other wouldn't work for us.
We're using:
```
aiobotocore==2.13.1
boto3==1.34.51
botocore==1.34.131
mypy-boto3-glue==1.34.136
[...]
pyiceberg==0.6.1
```
We're still unclear on whether it's indeed a bug or we're just using the
APIs improperly, any help would be appreciated.
Have a nice day!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]