sunyuhan1998 opened a new pull request, #10791:
URL: https://github.com/apache/gravitino/pull/10791

   ### What changes were proposed in this pull request?
   
   Add client-side credential caching in `BaseGVFSOperations` to avoid 
redundant HTTP calls to the Gravitino server's `/credentials` endpoint when 
credential vending is enabled.
   
   **Changes:**
   
   - Added `_credential_cache` (LRUCache) and `_credential_cache_lock` (RWLock) 
to `BaseGVFSOperations`
   - Added `_calculate_credential_expire_time()` method using the existing 
`credential_expiration_ratio` config (default 0.5)
   - Added `_get_credentials_with_cache()` method with read-write lock + 
double-checked locking pattern
   - Replaced the unconditional `get_credentials()` call in 
`_get_actual_filesystem_by_location_name()` with the cached version
   - Filtered static credentials (`expire_time_in_ms() == 0`) when calculating 
cache expiry to handle mixed credential types correctly
   - Added 10 unit tests covering: cache hit/miss, expiry refresh, 
per-fileset/per-location isolation, empty credentials, thread safety, and 
config correctness
   
   ### Why are the changes needed?
   
   When `enable_credential_vending=True`, the Python SDK makes an HTTP request 
to `/credentials` on **every** file operation, even if the previously fetched 
credentials have not expired. With a typical 1-hour credential lifetime, this 
results in hundreds of unnecessary HTTP round-trips.
   
   The Java SDK avoids this by embedding lazy-refresh `CredentialsProvider` 
instances (e.g., `OSSCredentialsProvider`, `S3CredentialsProvider`) that cache 
credentials locally and only fetch new ones at 50% of their lifetime. This PR 
brings the Python SDK to parity with the Java SDK's approach.
   
   Fix: #10790
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. All changes are internal to `BaseGVFSOperations`. No public API or 
configuration changes required.
   
   ### How was this patch tested?
   
   - **Unit tests** (10 new cases in `test_gvfs_credential_cache.py`): cache 
hit/miss, expiry refresh, per-fileset/per-location isolation, empty 
credentials, thread safety, expiration ratio, never-expire credentials
   - **Integration tests** (2 new cases against real Gravitino server + MinIO):
     - Verified 5 file operations trigger only 1 `/credentials` request (vs. 5 
without caching)
     - Verified expired cache triggers a fresh credential fetch
   - **Existing tests**: All 45 existing gvfs/credential unit tests + 12 
integration tests pass with no regression
   - **Code quality**: `ruff check` and `ruff format` pass


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to