c-thiel opened a new issue, #10486:
URL: https://github.com/apache/iceberg/issues/10486

   ### Feature Request / Improvement
   
   Currently the `S3SignRequest` for remote signing for the Iceberg REST 
Catalog does not include the object the client wants to access explicitly 
(table or view uuid).
   
https://github.com/apache/iceberg/blob/282fa73b2c2b51b1d519a91e8bb05b2d2bdf3cf4/aws/src/main/resources/s3-signer-open-api.yaml#L104-L110
   
   The first task of the signer endpoint is to check if the client is 
authorized to perform the operation. For that, I see two different approaches:
   
   ## 1. Exchange Token
   When calling catalog endpoints such as `loadTable`, a token exchange is 
performed by the catalog. The new token contains the location of the table and 
looks something like this:
   ```json
   {
     "sub": "...",
     "privileges": [
       "SELECT",
       "MANAGE_GRANTS",
       "DROP",
       "UPDATE"
     ],
     "parentTokenId": "...",
     "iss": "tabular.io",
     "type": "TABLE",
     ...
     "location": "s3://my-bucket/<my-namespace-id>/<my-table-id>",
   }
   ```
   
   Getting this token, the signer just needs to validate the tokens validity 
and then compare `method` and `uri` from the sign request to the token.
   
   This is the way tabular.io implements it - and its not nice for multiple 
reasons:
   - It requires very frequent token exchanges - every time a `loadTable` / 
`commitTable` /  ... endpoint is called
   - It only works in scenarios where token exchange is possible, thus rules 
out other authentication mechanisms
   - It forces tight coupling between Authentication & Authorization by storing 
Authorization related information in the token
   
   ## 2. Validate by Table ID
   Alternatively, we can use the Table / Views - UUID and then handle 
Authorization based on any other way we require. One example could be to store 
Authorizations in a google 
[Zanzibar](https://research.google/pubs/zanzibar-googles-consistent-global-authorization-system/)
 based solution such as [openfga](http://openfga.dev).
   
   For a free choice of Authorization mechanism for a signer / catalog 
implementation, the minimum requirement is to know the id of the object that 
should be accessed. 
   The workflow would be:
   - Check AuthN in Token
   - Check AuthZ based on provided TableUuid
   - Retrieve Table Location and match against specified `uri` that is 
requested to be signed
   
   While it is already possible to implement this today by using the `uri` to 
lookup the table uuid, it is not trivial to do the "reverse" lookup `uri` -> 
`table-uuid`, especially because there are so many ways to specify an S3 Url 
(path-style vs. bucket-style, custom endpoints, region optional specified, ...).
   
   ## Proposal Variant 1
   Adding a required field: `object-id` and `object-type` to the 
`S3SignRequest` would make it very explicit which object is trying to be 
accessed. `object-type` could be an enum with "table" and "view" values, while 
`object-id` contains the uuid which is part of the metadata.
   
   ## Proposal Variant 2
   The underlying problem is that we are missing a standard way to get 
metadata, such as the `table-id`, from the catalog endpoint to the signer 
endpoint.  A less explicit way of handling this could be to introduce a 
`config` attribute `s3.signer.properties` that the server can specify in its 
`config` field on `loadTable` / ... responses. The client should then forward 
these to the `properties` field of the `S3SignRequest `. `s3.signer.properties`.
   
   In the `spec`, proposal 2 would just mean to add an additional comment right 
around here:
   
https://github.com/apache/iceberg/blob/282fa73b2c2b51b1d519a91e8bb05b2d2bdf3cf4/aws/src/main/resources/s3-signer-open-api.yaml#L120
   "properties should be populated from the `s3.signer.properties` 
configuration of the catalog. They can be overwritten by each catalog endpoint, 
such as `loadTable` or `commitTable` in the returned `config` Map."
   
   
   
   Both Variant 1 and Variant 2 require changes for all clients.
   Any thoughts or additional insights welcome!
   
   
   ### Query engine
   
   None


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to