rdblue commented on PR #10256:
URL: https://github.com/apache/iceberg/pull/10256#issuecomment-2113392645

   @adutra, thanks for noticing and pointing out the inconsistencies. I do 
think it's important to address those. In particular, I did not realize that 
the table session that was introduced recently used the `newSession` codepath 
that uses `credential`.
   
   (I also understand your reading of the `config` endpoint, but I don't think 
that implies that configs like the OAuth2 URI should be handled that way.)
   
   We need to address these cases, but I think that this PR goes the wrong 
direction: instead of allowing auth properties from the config and table 
routes, we actually need to be more strict. A table session should never use a 
credential from the REST service.
   
   I think it would help if we had a common understanding of the difference 
between a token and a credential. A credential is a pre-established ID and 
secret pair that is [used for 
authentication](https://datatracker.ietf.org/doc/html/rfc6749#section-2.3). A 
credential is controlled and owned by the client and is typically long-lived, 
like a username/password. A token is similar because it can be used for 
authentication (i.e. bearer auth) but tokens are short-lived and can contain a 
lot more information added by the service that is validated by signing the 
token.
   
   In the REST protocol, `credential` is used for the first authentication 
step, which is to use the client credentials flow to exchange it for a token 
with the OAuth2 endpoint, which is now configurable. Both the endpoint and the 
credential are user-controlled because the credential is a long-lived secret 
and the endpoint must be trusted to receive that secret. The token that is 
returned to the client has a much shorter lifespan, which is why it is okay to 
share the token with the REST catalog and not only the OAuth2 endpoint.
   
   Another reason why tokens are handled differently is that tokens can contain 
additional information that is signed by the service. That additional 
information can be used to check something once and use the result for the 
lifetime of the token. When designing the REST protocol, we thought that it was 
likely that tokens would be used for this purpose, so we built a couple of ways 
to exchange an auth token (from OAuth2) for a more specific or narrow token.
   
   For example, the service could verify that the client is not coming from a 
blocked IP range. Checking IP ranges is expensive, so you'd want to do it once 
and produce a token that shows the incoming IP has been validated. As long as 
the token's IP claim is the same as the incoming IP, you can trust that the 
validation was done and not repeat the expensive check.
   
   As you can see, there's a good reason to handle `token` and `credential` 
differently. Tokens must be sent to the catalog service for bearer auth and 
implementations are free to issue more narrow tokens in reasonable places. 
However, credentials should be more protected and only sent to the correct 
OAuth2 endpoint.
   
   For this PR, I think the question is how to handle `credential`, `scope`, 
and `oauth2-server-uri`. For those, I think the current handling is correct.
   * `credential` -- This is a pre-established shared secret. It should never 
be supplied by the catalog service, especially when `token` is suitable for any 
purpose that `credential` might be used for.
   * `scope` -- I don't think this one makes sense. Why would a service tell 
the client to request more capabilities?
   * `oauth2-server-uri` -- If the service can send a different URI, then it 
can tell the client to send its credential to a different endpoint. I don't 
think there's any value to that, given that auth needs to happen before 
`config` . This would just increase the surface area for attacks, which we 
don't want to do however unlikely they may be.
   
   In the end, allowing these to be overridden by catalog endpoints would 
weaken security and introduce extra requests (keep requirements simple!). I 
don't think it is a good idea to move forward with this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to