syun64 opened a new issue, #8869: URL: https://github.com/apache/iceberg/issues/8869
### Feature Request / Improvement The design of OAuth2 enables [Separation of Roles](https://www.oauth.com/oauth2-servers/differences-between-oauth-1-2/separation-of-roles/) between the API server and an authorization server. The two roles can be on physically separate servers, and even be on different domain names. In this design, the authorization server needs to know about the app's **client_id** and **client_secret**, but the API server would only ever need to accept access tokens. The current implementation of the Rest Catalog (in Spark, and in PyIceberg) assumes that the resource server is also the authorization server. Unfortunately this requires the Rest Catalog Server to also take on the responsibility of credentials management. Adding support for an optional authorization server url parameter in the catalog configs would help separate out the responsibility of user/machine client authentication based authorization grant to a dedicated authorization server, and allow the Rest Catalog Server to only be responsible to parsing and validating the access_token that had already been retrieved by the querying engine (Spark/PyIceberg). The existing Rest Catalog Open API spec already supports Separation of Roles, and no change may be needed to support this feature enhancement. It accepts the Bearer Authorization provided in the header of the requests sent to its non-auth endpoints, and uses that to grant access to the users. Currently, the access_token is provided into these requests by Spark and in PyIceberg in two ways: 1. It is fetched from the Rest Catalog Server from the v1/oauth/tokens endpoint using the Client credentials that is provided in the config 2. It uses a static access_token that was fetched through prior means and provided in the config instead Approach (2) sort of enables Separation of Roles and allows us to fetch the access token from an external authorization server, but it is an incomplete solution due to the fact that access tokens may have short lifespans. When a token expires, the engine is required to use the existing Client Credentials (or the expired access token?) to re-request a valid access_token through [token exchange](https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/rest/auth/OAuth2Util.java#L467). In the case the Rest Catalog Server isn't an authorization server, this is not possible. I would like to enhance the existing [RestSessionCatalog](https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java#L135), [HTTPClient](https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/rest/HTTPClient.java#L81) and [OAuth2Util](https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/rest/auth/OAuth2Util.java#L190) implementations to optionally take an _authorization_server_url_ config parameter to direct the auth token requests to the separate endpoint instead of the resource server's v1/oauth/tokens endpoint in order to achieve Separation of Roles. If that sounds good, I'd be happy to put up a PR that implements the above proposal, and also do the same on [iceberg-python](https://github.com/apache/iceberg-python). [Previous discussion on PyIceberg](https://github.com/apache/iceberg/pull/8400) on a similar issue. ### Query engine Spark -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org