pingtimeout commented on code in PR #1339:
URL: https://github.com/apache/polaris/pull/1339#discussion_r2035522063


##########
polaris-core/src/main/java/org/apache/polaris/core/persistence/cache/EntityCache.java:
##########
@@ -255,104 +150,74 @@ public void setCacheMode(EntityCacheMode cacheMode) {
       @Nonnull PolarisBaseEntity entityToValidate,
       int entityMinVersion,
       int entityGrantRecordsMinVersion) {
-    long entityCatalogId = entityToValidate.getCatalogId();
     long entityId = entityToValidate.getId();
-    PolarisEntityType entityType = entityToValidate.getType();
 
-    // first lookup the cache to find the existing cache entry
+    // First lookup the cache to find the existing cache entry
     ResolvedPolarisEntity existingCacheEntry = this.getEntityById(entityId);
 
-    // the caller's fetched entity may have come from a stale lookup byName; 
we should consider
-    // the existingCacheEntry to be the older of the two for purposes of 
invalidation to make
-    // sure when we replaceCacheEntry we're also removing the old name if it's 
no longer valid
-    EntityCacheByNameKey nameKey = new EntityCacheByNameKey(entityToValidate);
-    ResolvedPolarisEntity existingCacheEntryByName = 
this.getEntityByName(nameKey);
-    if (existingCacheEntryByName != null
-        && existingCacheEntry != null
-        && isNewer(existingCacheEntry, existingCacheEntryByName)) {
-      existingCacheEntry = existingCacheEntryByName;
+    // See if we need to load or refresh that entity
+    if (existingCacheEntry != null
+        && existingCacheEntry.getEntity().getEntityVersion() >= 
entityMinVersion
+        && existingCacheEntry.getEntity().getGrantRecordsVersion()
+            >= entityGrantRecordsMinVersion) {
+      // Cache hit and cache entry is up to date, nothing to do.
+      return existingCacheEntry;
     }
 
-    // the new one to be returned
-    final ResolvedPolarisEntity newCacheEntry;
-
-    // see if we need to load or refresh that entity
-    if (existingCacheEntry == null
-        || existingCacheEntry.getEntity().getEntityVersion() < entityMinVersion
-        || existingCacheEntry.getEntity().getGrantRecordsVersion() < 
entityGrantRecordsMinVersion) {
+    // Either cache miss, dropped entity or stale entity. In either case, 
invalidate and reload it.
+    // Do the refresh in a critical section based on the entity ID to avoid 
race conditions.
+    Lock lock = this.locks.get(entityId);
+    try {
+      lock.lock();
+
+      // Lookup the cache again in case another thread has already invalidated 
it.
+      existingCacheEntry = this.getEntityById(entityId);
+
+      // See if the entity has been refreshed concurrently
+      if (existingCacheEntry != null
+          && existingCacheEntry.getEntity().getEntityVersion() >= 
entityMinVersion
+          && existingCacheEntry.getEntity().getGrantRecordsVersion()
+              >= entityGrantRecordsMinVersion) {
+        // Cache hit and cache entry is up to date now, exit
+        return existingCacheEntry;
+      }

Review Comment:
   @snazy To be honest, I am not sure about this design either.  That being 
said there is an important nuance.  The `getAndRefreshIfNeeded` method 
guarantees that if an entity is stale, it will be re-fetched.  But it does 
**not** guarantee that the entity that will be returned has the right entity 
version, or even an entity version greater than the `entityMinVersion` and 
`entityGrantRecordsMinVersion` passed by the caller.
   
   In other words, it would be possible to call `getAndRefreshIfNeeded` with 
`Integer.MAX_VALUE` for both minimum versions in order to force a round-trip to 
the database.  I am not sure if that pattern is ever used in the codebase.  But 
the code allows that to happen.
   
   >  Assuming the caller did not hit the database first, it's unclear to me 
how cache-coherency across multiple Polaris instances is handled.
   
   This is a very good question.  Like, one could imagine an event-based system 
where Polaris servers would be informed that a new version of an entity has 
been created.  But then, why not just use a distributed cache altogether...?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to