dennishuo commented on code in PR #1339:
URL: https://github.com/apache/polaris/pull/1339#discussion_r2072301410
##########
polaris-core/src/main/java/org/apache/polaris/core/persistence/cache/EntityCache.java:
##########
@@ -255,104 +150,74 @@ public void setCacheMode(EntityCacheMode cacheMode) {
@Nonnull PolarisBaseEntity entityToValidate,
int entityMinVersion,
int entityGrantRecordsMinVersion) {
- long entityCatalogId = entityToValidate.getCatalogId();
long entityId = entityToValidate.getId();
- PolarisEntityType entityType = entityToValidate.getType();
- // first lookup the cache to find the existing cache entry
+ // First lookup the cache to find the existing cache entry
ResolvedPolarisEntity existingCacheEntry = this.getEntityById(entityId);
- // the caller's fetched entity may have come from a stale lookup byName;
we should consider
- // the existingCacheEntry to be the older of the two for purposes of
invalidation to make
- // sure when we replaceCacheEntry we're also removing the old name if it's
no longer valid
- EntityCacheByNameKey nameKey = new EntityCacheByNameKey(entityToValidate);
- ResolvedPolarisEntity existingCacheEntryByName =
this.getEntityByName(nameKey);
- if (existingCacheEntryByName != null
- && existingCacheEntry != null
- && isNewer(existingCacheEntry, existingCacheEntryByName)) {
- existingCacheEntry = existingCacheEntryByName;
+ // See if we need to load or refresh that entity
+ if (existingCacheEntry != null
+ && existingCacheEntry.getEntity().getEntityVersion() >=
entityMinVersion
+ && existingCacheEntry.getEntity().getGrantRecordsVersion()
+ >= entityGrantRecordsMinVersion) {
+ // Cache hit and cache entry is up to date, nothing to do.
+ return existingCacheEntry;
}
- // the new one to be returned
- final ResolvedPolarisEntity newCacheEntry;
-
- // see if we need to load or refresh that entity
- if (existingCacheEntry == null
- || existingCacheEntry.getEntity().getEntityVersion() < entityMinVersion
- || existingCacheEntry.getEntity().getGrantRecordsVersion() <
entityGrantRecordsMinVersion) {
+ // Either cache miss, dropped entity or stale entity. In either case,
invalidate and reload it.
+ // Do the refresh in a critical section based on the entity ID to avoid
race conditions.
+ Lock lock = this.locks.get(entityId);
+ try {
+ lock.lock();
+
+ // Lookup the cache again in case another thread has already invalidated
it.
+ existingCacheEntry = this.getEntityById(entityId);
+
+ // See if the entity has been refreshed concurrently
+ if (existingCacheEntry != null
+ && existingCacheEntry.getEntity().getEntityVersion() >=
entityMinVersion
+ && existingCacheEntry.getEntity().getGrantRecordsVersion()
+ >= entityGrantRecordsMinVersion) {
+ // Cache hit and cache entry is up to date now, exit
+ return existingCacheEntry;
+ }
Review Comment:
The most important part of this hinges on two things:
1. Reading entity/grant *versions* is "cheaper" than reading the entire
entities and grant records
2. The defined interfaces permit batch-lookup of entity/grant versions
(2) could also be added for getting entire entities/grant-records in bulk,
but the max size of such a batch would be more sensitive vs smaller
uniform-sized entityVersions records.
At a high-level the architecture for this is more like the Iceberg
"freshness-aware loading" semantic than more common caching patterns.
The alternatives are:
1. Staleness and cross-server decoherence tolerance (changes the semantics
of the callsites)
2. Active invalidation across the distributed system (complicated to
implement)
I could see active invalidation maybe being an option to pursue in the
future as an alternative to this freshness-aware readthrough type of pattern.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]