rdblue commented on PR #10256: URL: https://github.com/apache/iceberg/pull/10256#issuecomment-2113392645
@adutra, thanks for noticing and pointing out the inconsistencies. I do think it's important to address those. In particular, I did not realize that the table session that was introduced recently used the `newSession` codepath that uses `credential`. (I also understand your reading of the `config` endpoint, but I don't think that implies that configs like the OAuth2 URI should be handled that way.) We need to address these cases, but I think that this PR goes the wrong direction: instead of allowing auth properties from the config and table routes, we actually need to be more strict. A table session should never use a credential from the REST service. I think it would help if we had a common understanding of the difference between a token and a credential. A credential is a pre-established ID and secret pair that is [used for authentication](https://datatracker.ietf.org/doc/html/rfc6749#section-2.3). A credential is controlled and owned by the client and is typically long-lived, like a username/password. A token is similar because it can be used for authentication (i.e. bearer auth) but tokens are short-lived and can contain a lot more information added by the service that is validated by signing the token. In the REST protocol, `credential` is used for the first authentication step, which is to use the client credentials flow to exchange it for a token with the OAuth2 endpoint, which is now configurable. Both the endpoint and the credential are user-controlled because the credential is a long-lived secret and the endpoint must be trusted to receive that secret. The token that is returned to the client has a much shorter lifespan, which is why it is okay to share the token with the REST catalog and not only the OAuth2 endpoint. Another reason why tokens are handled differently is that tokens can contain additional information that is signed by the service. That additional information can be used to check something once and use the result for the lifetime of the token. When designing the REST protocol, we thought that it was likely that tokens would be used for this purpose, so we built a couple of ways to exchange an auth token (from OAuth2) for a more specific or narrow token. For example, the service could verify that the client is not coming from a blocked IP range. Checking IP ranges is expensive, so you'd want to do it once and produce a token that shows the incoming IP has been validated. As long as the token's IP claim is the same as the incoming IP, you can trust that the validation was done and not repeat the expensive check. As you can see, there's a good reason to handle `token` and `credential` differently. Tokens must be sent to the catalog service for bearer auth and implementations are free to issue more narrow tokens in reasonable places. However, credentials should be more protected and only sent to the correct OAuth2 endpoint. For this PR, I think the question is how to handle `credential`, `scope`, and `oauth2-server-uri`. For those, I think the current handling is correct. * `credential` -- This is a pre-established shared secret. It should never be supplied by the catalog service, especially when `token` is suitable for any purpose that `credential` might be used for. * `scope` -- I don't think this one makes sense. Why would a service tell the client to request more capabilities? * `oauth2-server-uri` -- If the service can send a different URI, then it can tell the client to send its credential to a different endpoint. I don't think there's any value to that, given that auth needs to happen before `config` . This would just increase the surface area for attacks, which we don't want to do however unlikely they may be. In the end, allowing these to be overridden by catalog endpoints would weaken security and introduce extra requests (keep requirements simple!). I don't think it is a good idea to move forward with this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org