kevinjqliu commented on code in PR #2244:
URL: https://github.com/apache/iceberg-python/pull/2244#discussion_r2231707281


##########
pyiceberg/catalog/rest/auth.py:
##########
@@ -187,3 +289,4 @@ def create(cls, class_or_name: str, config: Dict[str, Any]) 
-> AuthManager:
 AuthManagerFactory.register("noop", NoopAuthManager)
 AuthManagerFactory.register("basic", BasicAuthManager)
 AuthManagerFactory.register("legacyoauth2", LegacyOAuth2AuthManager)
+AuthManagerFactory.register("oauth2", OAuth2AuthManager)

Review Comment:
   nit: is it a good idea to call `AuthManagerFactory.register` directly in the 
file? Is it better if its encapsulated in a function? 
   
   Im worry about import automatically running this code



##########
mkdocs/docs/configuration.md:
##########
@@ -374,6 +374,94 @@ Specific headers defined by the RESTCatalog spec include:
 | ------------------------------------ | ------------------------------------- 
| -------------------- | 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 |
 | `header.X-Iceberg-Access-Delegation` | `{vended-credentials,remote-signing}` 
| `vended-credentials` | Signal to the server that the client supports 
delegated access via a comma-separated list of access mechanisms. The server 
may choose to supply access via any or none of the requested mechanisms |
 
+#### Authentication in RESTCatalog
+
+The RESTCatalog supports pluggable authentication via the `auth` configuration 
block. This allows you to specify which how the access token will be fetched 
and managed for use with the HTTP requests to the RESTCatalog server. The 
authentication method is selected by setting the `auth.type` property, and 
additional configuration can be provided as needed for each method.
+
+##### Supported Authentication Types
+
+- `noop`: No authentication (no Authorization header sent).
+- `basic`: HTTP Basic authentication.
+- `oauth2`: OAuth2 client credentials flow.
+- `legacyoauth2`: Legacy OAuth2 client credentials flow (Deprecated and will 
be removed in PyIceberg 1.0.0)

Review Comment:
   nit: i think this becomes a little confusing. `legacyoauth2` is a fallback 
mechanism. i.e. when the `auth:` block is not provided. i think we should call 
this out



##########
pyiceberg/catalog/rest/auth.py:
##########
@@ -109,6 +122,95 @@ def auth_header(self) -> str:
         return f"Bearer {self._token}"
 
 
+class OAuth2TokenProvider:
+    """Thread-safe OAuth2 token provider with token refresh support."""
+
+    client_id: str
+    client_secret: str
+    token_url: str
+    scope: Optional[str]
+    refresh_margin: int
+    expires_in: Optional[int]
+
+    _token: Optional[str]
+    _expires_at: int
+    _lock: threading.Lock
+
+    def __init__(
+        self,
+        client_id: str,
+        client_secret: str,
+        token_url: str,
+        scope: Optional[str] = None,
+        refresh_margin: int = 60,
+        expires_in: Optional[int] = None,
+    ):
+        self.client_id = client_id
+        self.client_secret = client_secret
+        self.token_url = token_url
+        self.scope = scope
+        self.refresh_margin = refresh_margin
+        self.expires_in = expires_in
+
+        self._token = None
+        self._expires_at = 0
+        self._lock = threading.Lock()
+
+    def _refresh_token(self) -> None:
+        data = {
+            "grant_type": "client_credentials",
+            "client_id": self.client_id,
+            "client_secret": self.client_secret,
+        }
+        if self.scope:
+            data["scope"] = self.scope
+
+        response = requests.post(self.token_url, data=data)
+        response.raise_for_status()
+        result = response.json()
+
+        self._token = result["access_token"]
+        expires_in = result.get("expires_in", self.expires_in)
+        if expires_in is None:
+            raise ValueError(
+                "The expiration time of the Token must be provided by the 
Server in the Access Token Response in `expired_in` field, or by the PyIceberg 
Client."
+            )
+        self._expires_at = time.time() + expires_in - self.refresh_margin
+
+    def get_token(self) -> str:
+        with self._lock:
+            if not self._token or time.time() >= self._expires_at:
+                self._refresh_token()
+            if self._token is None:
+                raise ValueError("Authorization token is None after refresh")
+            return self._token
+
+
+class OAuth2AuthManager(AuthManager):

Review Comment:
   do we have any tests for `LegacyOAuth2AuthManager`? do we want 
`OAuth2AuthManager` to be feature parity in this first release? 



##########
pyiceberg/catalog/rest/auth.py:
##########
@@ -109,6 +122,95 @@ def auth_header(self) -> str:
         return f"Bearer {self._token}"
 
 
+class OAuth2TokenProvider:
+    """Thread-safe OAuth2 token provider with token refresh support."""
+
+    client_id: str
+    client_secret: str
+    token_url: str
+    scope: Optional[str]
+    refresh_margin: int
+    expires_in: Optional[int]
+
+    _token: Optional[str]
+    _expires_at: int
+    _lock: threading.Lock
+
+    def __init__(
+        self,
+        client_id: str,
+        client_secret: str,
+        token_url: str,
+        scope: Optional[str] = None,
+        refresh_margin: int = 60,
+        expires_in: Optional[int] = None,
+    ):
+        self.client_id = client_id
+        self.client_secret = client_secret
+        self.token_url = token_url
+        self.scope = scope
+        self.refresh_margin = refresh_margin
+        self.expires_in = expires_in
+
+        self._token = None
+        self._expires_at = 0
+        self._lock = threading.Lock()
+
+    def _refresh_token(self) -> None:
+        data = {
+            "grant_type": "client_credentials",
+            "client_id": self.client_id,
+            "client_secret": self.client_secret,
+        }
+        if self.scope:
+            data["scope"] = self.scope
+
+        response = requests.post(self.token_url, data=data)
+        response.raise_for_status()
+        result = response.json()
+
+        self._token = result["access_token"]
+        expires_in = result.get("expires_in", self.expires_in)
+        if expires_in is None:
+            raise ValueError(
+                "The expiration time of the Token must be provided by the 
Server in the Access Token Response in `expired_in` field, or by the PyIceberg 
Client."
+            )
+        self._expires_at = time.time() + expires_in - self.refresh_margin
+
+    def get_token(self) -> str:
+        with self._lock:
+            if not self._token or time.time() >= self._expires_at:
+                self._refresh_token()
+            if self._token is None:
+                raise ValueError("Authorization token is None after refresh")
+            return self._token
+
+
+class OAuth2AuthManager(AuthManager):

Review Comment:
   i dont see `credential`, `resource`, and `audience`
   
https://github.com/apache/iceberg-python/blob/4cac6910667899238a750adc560f317f62718f82/mkdocs/docs/configuration.md?plain=1#L368-L371
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to