pingtimeout commented on code in PR #1339:
URL: https://github.com/apache/polaris/pull/1339#discussion_r2035522063
##########
polaris-core/src/main/java/org/apache/polaris/core/persistence/cache/EntityCache.java:
##########
@@ -255,104 +150,74 @@ public void setCacheMode(EntityCacheMode cacheMode) {
@Nonnull PolarisBaseEntity entityToValidate,
int entityMinVersion,
int entityGrantRecordsMinVersion) {
- long entityCatalogId = entityToValidate.getCatalogId();
long entityId = entityToValidate.getId();
- PolarisEntityType entityType = entityToValidate.getType();
- // first lookup the cache to find the existing cache entry
+ // First lookup the cache to find the existing cache entry
ResolvedPolarisEntity existingCacheEntry = this.getEntityById(entityId);
- // the caller's fetched entity may have come from a stale lookup byName;
we should consider
- // the existingCacheEntry to be the older of the two for purposes of
invalidation to make
- // sure when we replaceCacheEntry we're also removing the old name if it's
no longer valid
- EntityCacheByNameKey nameKey = new EntityCacheByNameKey(entityToValidate);
- ResolvedPolarisEntity existingCacheEntryByName =
this.getEntityByName(nameKey);
- if (existingCacheEntryByName != null
- && existingCacheEntry != null
- && isNewer(existingCacheEntry, existingCacheEntryByName)) {
- existingCacheEntry = existingCacheEntryByName;
+ // See if we need to load or refresh that entity
+ if (existingCacheEntry != null
+ && existingCacheEntry.getEntity().getEntityVersion() >=
entityMinVersion
+ && existingCacheEntry.getEntity().getGrantRecordsVersion()
+ >= entityGrantRecordsMinVersion) {
+ // Cache hit and cache entry is up to date, nothing to do.
+ return existingCacheEntry;
}
- // the new one to be returned
- final ResolvedPolarisEntity newCacheEntry;
-
- // see if we need to load or refresh that entity
- if (existingCacheEntry == null
- || existingCacheEntry.getEntity().getEntityVersion() < entityMinVersion
- || existingCacheEntry.getEntity().getGrantRecordsVersion() <
entityGrantRecordsMinVersion) {
+ // Either cache miss, dropped entity or stale entity. In either case,
invalidate and reload it.
+ // Do the refresh in a critical section based on the entity ID to avoid
race conditions.
+ Lock lock = this.locks.get(entityId);
+ try {
+ lock.lock();
+
+ // Lookup the cache again in case another thread has already invalidated
it.
+ existingCacheEntry = this.getEntityById(entityId);
+
+ // See if the entity has been refreshed concurrently
+ if (existingCacheEntry != null
+ && existingCacheEntry.getEntity().getEntityVersion() >=
entityMinVersion
+ && existingCacheEntry.getEntity().getGrantRecordsVersion()
+ >= entityGrantRecordsMinVersion) {
+ // Cache hit and cache entry is up to date now, exit
+ return existingCacheEntry;
+ }
Review Comment:
@snazy To be honest, I am not sure about this design either. That being
said there is an important nuance. The `getAndRefreshIfNeeded` method
guarantees that if an entity is stale, it will be re-fetched. But it does
**not** guarantee that the entity that will be returned has the right entity
version, or even an entity version greater than the `entityMinVersion` and
`entityGrantRecordsMinVersion` passed by the caller.
In other words, it would be possible to call `getAndRefreshIfNeeded` with
`Integer.MAX_VALUE` for both minimum versions in order to force a round-trip to
the database. I am not sure if that pattern is ever used in the codebase. But
the code allows that to happen.
> Assuming the caller did not hit the database first, it's unclear to me
how cache-coherency across multiple Polaris instances is handled.
This is a very good question. Like, one could imagine an event-based system
where Polaris servers would be informed that a new version of an entity has
been created. But then, why not just use a distributed cache altogether...?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]