JannicCutura opened a new issue, #2991:
URL: https://github.com/apache/iceberg-python/issues/2991

   ### Feature Request / Improvement
   
   When accessing Iceberg tables stored in S3 buckets owned by a different AWS 
account, some organizations enforce that cross-account access must go through 
[S3 Access 
Points](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-points.html)
 rather than allowing direct bucket access. This is a common security policy 
that provides better access control and auditability for shared data.
   
   Currently, PyIceberg has no way to configure access point mappings, making 
it impossible to access cross-account tables in environments where access 
points are required.
   
   **Proposed solution**
   
   Add configuration to map S3 bucket names to access point aliases. When 
PyIceberg encounters an S3 path, it rewrites the bucket name to the configured 
access point alias before making the request.
   
   **Proposed configuration format:**
   
   ```python
   catalog = load_catalog(
       "glue",
       **{
           "type": "glue",
           # Map bucket to access point alias
           "s3.access-point.my-bucket-name": "my-access-point-alias-s3alias",
       }
   )
   ```
   
   This follows a similar pattern to Apache Spark's 
`fs.s3a.bucket.<bucket>.accesspoint.arn` configuration.
   
   **Why access point alias instead of ARN?**
   
   PyArrow's S3FileSystem expects paths in `bucket/key` format. Access point 
aliases (ending in `-s3alias`) work exactly like bucket names, while ARNs do 
not. The alias approach is simpler and works with PyArrow without additional 
changes.
   
   **Scope of changes:**
   
   - `pyiceberg/io/__init__.py` - Add config constant `S3_ACCESS_POINT_PREFIX`
   - `pyiceberg/io/pyarrow.py` - Add bucket-to-alias resolution in 
`new_input()`, `new_output()`, `delete()`
   - `pyiceberg/io/fsspec.py` - Same pattern for fsspec implementation
   - Unit tests for the resolution logic
   - Documentation update
   
   **I have a working implementation** and would be happy to submit a PR. It 
has been tested with:
   - Reading Iceberg tables via Glue catalog from cross-account S3
   - Writing new Iceberg tables to cross-account S3
   
   **References:**
   - [AWS S3 Access Points 
documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-points.html)
   - [Spark S3A Access Point 
configuration](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html#Configuring_S3_Access_Points)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to