JannicCutura opened a new issue, #2991: URL: https://github.com/apache/iceberg-python/issues/2991
### Feature Request / Improvement When accessing Iceberg tables stored in S3 buckets owned by a different AWS account, some organizations enforce that cross-account access must go through [S3 Access Points](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-points.html) rather than allowing direct bucket access. This is a common security policy that provides better access control and auditability for shared data. Currently, PyIceberg has no way to configure access point mappings, making it impossible to access cross-account tables in environments where access points are required. **Proposed solution** Add configuration to map S3 bucket names to access point aliases. When PyIceberg encounters an S3 path, it rewrites the bucket name to the configured access point alias before making the request. **Proposed configuration format:** ```python catalog = load_catalog( "glue", **{ "type": "glue", # Map bucket to access point alias "s3.access-point.my-bucket-name": "my-access-point-alias-s3alias", } ) ``` This follows a similar pattern to Apache Spark's `fs.s3a.bucket.<bucket>.accesspoint.arn` configuration. **Why access point alias instead of ARN?** PyArrow's S3FileSystem expects paths in `bucket/key` format. Access point aliases (ending in `-s3alias`) work exactly like bucket names, while ARNs do not. The alias approach is simpler and works with PyArrow without additional changes. **Scope of changes:** - `pyiceberg/io/__init__.py` - Add config constant `S3_ACCESS_POINT_PREFIX` - `pyiceberg/io/pyarrow.py` - Add bucket-to-alias resolution in `new_input()`, `new_output()`, `delete()` - `pyiceberg/io/fsspec.py` - Same pattern for fsspec implementation - Unit tests for the resolution logic - Documentation update **I have a working implementation** and would be happy to submit a PR. It has been tested with: - Reading Iceberg tables via Glue catalog from cross-account S3 - Writing new Iceberg tables to cross-account S3 **References:** - [AWS S3 Access Points documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-points.html) - [Spark S3A Access Point configuration](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Configuring_S3_Access_Points) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
