c-thiel opened a new issue, #10486: URL: https://github.com/apache/iceberg/issues/10486
### Feature Request / Improvement Currently the `S3SignRequest` for remote signing for the Iceberg REST Catalog does not include the object the client wants to access explicitly (table or view uuid). https://github.com/apache/iceberg/blob/282fa73b2c2b51b1d519a91e8bb05b2d2bdf3cf4/aws/src/main/resources/s3-signer-open-api.yaml#L104-L110 The first task of the signer endpoint is to check if the client is authorized to perform the operation. For that, I see two different approaches: ## 1. Exchange Token When calling catalog endpoints such as `loadTable`, a token exchange is performed by the catalog. The new token contains the location of the table and looks something like this: ```json { "sub": "...", "privileges": [ "SELECT", "MANAGE_GRANTS", "DROP", "UPDATE" ], "parentTokenId": "...", "iss": "tabular.io", "type": "TABLE", ... "location": "s3://my-bucket/<my-namespace-id>/<my-table-id>", } ``` Getting this token, the signer just needs to validate the tokens validity and then compare `method` and `uri` from the sign request to the token. This is the way tabular.io implements it - and its not nice for multiple reasons: - It requires very frequent token exchanges - every time a `loadTable` / `commitTable` / ... endpoint is called - It only works in scenarios where token exchange is possible, thus rules out other authentication mechanisms - It forces tight coupling between Authentication & Authorization by storing Authorization related information in the token ## 2. Validate by Table ID Alternatively, we can use the Table / Views - UUID and then handle Authorization based on any other way we require. One example could be to store Authorizations in a google [Zanzibar](https://research.google/pubs/zanzibar-googles-consistent-global-authorization-system/) based solution such as [openfga](http://openfga.dev). For a free choice of Authorization mechanism for a signer / catalog implementation, the minimum requirement is to know the id of the object that should be accessed. The workflow would be: - Check AuthN in Token - Check AuthZ based on provided TableUuid - Retrieve Table Location and match against specified `uri` that is requested to be signed While it is already possible to implement this today by using the `uri` to lookup the table uuid, it is not trivial to do the "reverse" lookup `uri` -> `table-uuid`, especially because there are so many ways to specify an S3 Url (path-style vs. bucket-style, custom endpoints, region optional specified, ...). ## Proposal Variant 1 Adding a required field: `object-id` and `object-type` to the `S3SignRequest` would make it very explicit which object is trying to be accessed. `object-type` could be an enum with "table" and "view" values, while `object-id` contains the uuid which is part of the metadata. ## Proposal Variant 2 The underlying problem is that we are missing a standard way to get metadata, such as the `table-id`, from the catalog endpoint to the signer endpoint. A less explicit way of handling this could be to introduce a `config` attribute `s3.signer.properties` that the server can specify in its `config` field on `loadTable` / ... responses. The client should then forward these to the `properties` field of the `S3SignRequest `. `s3.signer.properties`. In the `spec`, proposal 2 would just mean to add an additional comment right around here: https://github.com/apache/iceberg/blob/282fa73b2c2b51b1d519a91e8bb05b2d2bdf3cf4/aws/src/main/resources/s3-signer-open-api.yaml#L120 "properties should be populated from the `s3.signer.properties` configuration of the catalog. They can be overwritten by each catalog endpoint, such as `loadTable` or `commitTable` in the returned `config` Map." Both Variant 1 and Variant 2 require changes for all clients. Any thoughts or additional insights welcome! ### Query engine None -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org