yuqi1129 commented on code in PR #10696:
URL: https://github.com/apache/gravitino/pull/10696#discussion_r3084464595
##########
design-docs/cache-improvement-design.md:
##########
@@ -0,0 +1,873 @@
+# Gravitino Cache Improvement Design
+
+---
+
+## 1. Background
+
+### 1.1 System Overview
+
+Gravitino is a unified metadata management control plane. Compute engines
(Spark, Flink, Trino)
+call it during query planning to resolve catalog, schema, and table metadata,
and to verify
+user permissions. The access pattern is distinctly **read-heavy,
write-light**: DDL operations
+are infrequent, and metadata is resolved once per job.
+
+Gravitino is evolving from single-node to multi-node active-active HA
deployment. Each node
+currently maintains its own independent in-process Caffeine cache with no
cross-node
+synchronisation. Under HA, any write on one node leaves other nodes' caches
stale until TTL
+expiry.
+
+---
+
+### 1.2 Current Cache Architecture Overview
+
+Gravitino maintains three distinct caching layers for the authorization path:
+
+```
+┌──────────────────────────────────────────────────────┐
+│ Layer 3: Per-request cache (AuthorizationRequestContext) │
+│ Scope: one HTTP request; prevents duplicate auth calls │
+├──────────────────────────────────────────────────────┤
+│ Layer 2: Auth policy caches (JcasbinAuthorizer) │
+│ loadedRoles Cache<Long, Boolean> hook update/TTL │
+│ ownerRel Cache<Long, Optional<Long>> hook update/TTL │
+├──────────────────────────────────────────────────────┤
+│ Layer 1: Entity store cache (RelationalEntityStore) │
+│ CaffeineEntityCache — or NoOpsCache when disabled │
+│ Caches entity reads and relation queries for all modules │
+│ Controlled by Configs.CACHE_ENABLED │
+└──────────────────────────────────────────────────────┘
+```
+
+**JCasbin is the core of the auth cache system.** It maintains an in-memory
policy table:
+
+```
+(roleId, objectType, metadataId, privilege) → ALLOW | DENY
+```
+
+The Layer 2 caches exist solely to manage JCasbin's policy loading lifecycle:
+
+| Cache | Role
|
+|-----------------------------------------|----------------------------------------------------------------------------------------------------------------|
+| `loadedRoles: Cache<Long, Boolean>` | Tracks which roles are already
loaded into JCasbin — prevents repeated [C2]+[C3] queries on every auth request
|
+| `ownerRel: Cache<Long, Optional<Long>>` | Caches owner lookups — **prevents
[D1] on every auth request** (2–4 `isOwner()` calls per request, see §1.3.2) |
+
+Without `loadedRoles`, every auth request would re-execute N DB queries to
reload all of a
+user's role policies into JCasbin. These two caches are the reason the auth
path is fast on
+the warm path. Layer 1 (entity cache) additionally accelerates the name→ID
resolution calls
+([A], [B], [C1]) that feed into JCasbin's enforce call.
+
+---
+
+#### 1.2.1 Problems with the Current Entity Cache
+
+**The entity cache (Layer 1) has accumulated significant complexity and is not
well-suited to
+serve as a general-purpose or auth-dedicated caching layer.**
+
+##### Mixed responsibilities make it hard to maintain
+
+`CaffeineEntityCache` uses a single `Cache<EntityCacheRelationKey,
List<Entity>>` to store
+three semantically different kinds of data:
+
+| Stored data | Key form |
Example relation types |
+|-------------------------|--------------------------------------------------|-----------------------------------------------------------|
+| Direct entity | `(nameIdentifier, entityType, null)` |
any entity: catalog, schema, table, user, role, ... |
+| Relation result set | `(nameIdentifier, entityType, relType)` |
`ROLE_USER_REL`, `TAG_METADATA_OBJECT_REL`, ... |
+| Reverse index entries | `ReverseIndexCache` (separate radix tree) |
entity → list of cache keys that reference it |
+
+On top of this, a `cacheIndex` (radix tree) keeps a prefix-indexed view of all
keys to
+support cascading invalidation. The resulting invalidation logic
(`invalidateEntities`) is a
+BFS traversal that walks both the forward index and the reverse index, making
it difficult to
+reason about correctness and hard to extend safely.
+
+The five relation types currently tracked (`METADATA_OBJECT_ROLE_REL`,
`ROLE_USER_REL`,
+`ROLE_GROUP_REL`, `POLICY_METADATA_OBJECT_REL`, `TAG_METADATA_OBJECT_REL`) are
all
+auth-related, which reflects the original design intent: **the entity cache
was built
+primarily to serve the auth path.** Over time it accumulated relation types
and reverse-index
+logic without a clear ownership model, making it harder to maintain and evolve.
+
+##### Limited benefit for non-auth interfaces
+
+For general metadata API calls (list catalogs, list schemas, list tables), the
entity cache
+provides minimal benefit:
+
+| Operation | Goes through cache? | Notes
|
+|------------------------------------|---------------------|---------------------------------------------------|
+| `list(namespace, type)` | **No** | Bypasses cache
entirely; always hits DB |
+| `get(ident, type)` (single entity) | Yes | Cache helps on
repeated reads of the same entity |
+| `update(ident, type)` | Invalidate only | Invalidates
entry, write always goes to DB |
+| `listEntitiesByRelation(...)` | Yes | Only for the five
auth-centric relation types |
+
+In practice, the most common metadata browsing operations (`LIST` endpoints)
are not cached
+at the entity store level. The cache's real workload is the auth path, where
the same user
+entity, role assignments, and resource IDs are resolved on every single
authorization check.
+
+**Conclusion:** The entity cache is a de-facto auth cache dressed up as a
general-purpose
+cache. Its complexity is unjustified for the non-auth use case, and its
TTL-based consistency
+model is insufficient for the auth use case (see §1.8). A purpose-built auth
cache layer —
+separate from the entity store — is the cleaner path forward.
+
+---
+
+### 1.3 JCasbin Authorization — Deep Dive
+
+#### 1.3.1 Call Graph for a Single `authorize()` Check
+
+```
+JcasbinAuthorizer.authorize(principal, metalake, metadataObject, privilege)
+│
+├─ [A] getUserEntity(username, metalake)
+│ entityStore.get(USER by NameIdentifier)
+│ → Needed to obtain integer userId for JCasbin enforce()
+│
+├─ [B] MetadataIdConverter.getID(metadataObject, metalake) ← TARGET
RESOURCE
+│ entityStore.get(entity by NameIdentifier)
+│ → Needed to get integer metadataId for JCasbin enforce()
+│ → Called on every auth request
+│
+├─ [C] loadRolePrivilege(metalake, username, userId, requestContext)
+│ │ (guarded by requestContext.hasLoadRole — runs once per HTTP request)
+│ │
+│ ├─ [C1] entityStore.listEntitiesByRelation(ROLE_USER_REL, userIdentifier)
+│ │ → Get all roles assigned to this user
+│ │
+│ └─ For each role NOT already in loadedRoles cache:
+│ ├─ [C2] entityStore.get(RoleEntity by name) ← async, thread pool
+│ └─ loadPolicyByRoleEntity(roleEntity)
+│ └─ For each securableObject in role.securableObjects():
+│ ├─ [C3] MetadataIdConverter.getID(securableObject, metalake)
+│ └─ enforcer.addPolicy(roleId, objType, metadataId, privilege,
effect)
+│
+│ loadedRoles.put(roleId, true) ← mark role as loaded
+│
+├─ [D] isOwner() / loadOwnerPolicy(...) ← called on EVERY auth request (not
only OWNER
+│ │ privilege checks). Nearly all auth expressions contain ANY(OWNER,
METALAKE, CATALOG),
+│ │ which expands to METALAKE::OWNER || CATALOG::OWNER || … and calls
isOwner() directly
+│ │ via OGNL, independently of the authorize() path. Typical call count:
2–4 per request.
+│ ├─ Check ownerRel cache → if HIT, return (most non-owner users get
Optional.empty())
+│ └─ [D1] entityStore.listEntitiesByRelation(OWNER_REL, ...)
+│ ownerRel.put(metadataId, Optional.of(ownerId))
+│
+└─ [E] enforcer.enforce(userId, objectType, metadataId, privilege) ←
in-memory, O(1)
+```
+
+#### 1.3.2 What Each Cache Protects
+
+`loadedRoles: Cache<Long, Boolean>` — answers "is this role's policy already
in JCasbin?"
+Without it, every request re-executes [C2]+[C3] for all roles the user has
(N+1 queries).
+With it, [C2]+[C3] only run on first load per role. **This is the most
critical cache.**
+
+`ownerRel: Cache<Long, Optional<Long>>` — caches ownership lookups for
OWNER-privilege
+checks. **Contrary to initial analysis, `ownerRel` is consulted on virtually
every auth
+request**, not only when `privilege == OWNER`. The reason is that nearly every
authorization
+expression in `AuthorizationExpressionConstants` includes `ANY(OWNER,
METALAKE, CATALOG)`
+or similar clauses (e.g. `LOAD_TABLE_AUTHORIZATION_EXPRESSION`,
+`FILTER_TABLE_AUTHORIZATION_EXPRESSION`,
`LOAD_CATALOG_AUTHORIZATION_EXPRESSION`). The
+`ANY(OWNER, …)` macro expands to `METALAKE::OWNER || CATALOG::OWNER || …`, and
each
+`X::OWNER` term calls `isOwner()` directly — a code path that is **independent
of
+`authorize()`**. As a result, every auth request triggers 2–4 `isOwner()`
calls (one per
+ancestor level), each consulting `ownerRel`. For most non-owner users,
`ownerRel` caches
+`Optional.empty()`, which lets the ownership sub-check fail quickly without a
DB query.
+Without `ownerRel`, every auth request would add 2–4 extra DB queries against
`owner_meta`.
+
+**What these caches do NOT protect** (hit DB on every auth request without
entity cache):
+
+| Call | Description
| Protected by |
+|----------------------------------------------|-------------------------------------------|-------------------|
+| [A] `getUserEntity()` | Fetch User entity → get
integer userId | Entity cache only |
+| [B] `MetadataIdConverter.getID()` target | Resolve target resource name
→ integer ID | Entity cache only |
+| [C1] `listEntitiesByRelation(ROLE_USER_REL)` | Get user's role list
| Entity cache only |
+
+---
+
+### 1.4 Impact of Disabling Entity Cache
+
+Layer 2 sits **on top of** Layer 1. When Layer 1 is disabled (NoOpsCache),
calls [A], [B],
+[C1] hit DB on every auth request.
+
+| Call | With entity cache
| Without entity cache |
+|--------------------------------------------------|-------------------------------|---------------------------------|
+| [A] `getUserEntity()` | Cache hit after first
request | **DB query every auth request** |
+| [B] `MetadataIdConverter.getID()` target | Cache hit after first
request | **DB query every auth request** |
+| [C1] `listEntitiesByRelation(ROLE_USER_REL)` | Cache hit after first
request | **DB query every auth request** |
+| [C2] `entityStore.get(RoleEntity)` | Protected by
`loadedRoles` | DB only on cold role load |
+| [C3] `MetadataIdConverter.getID()` per privilege | Protected by
`loadedRoles` | DB only on cold role load |
+| [D1] `listEntitiesByRelation(OWNER_REL)` | Protected by `ownerRel`
| **DB query 2–4x per request** |
+
+---
+
+
+## 2. Goals
+
+### 2.1 The Two Problems to Solve
+
+**Problem 1 — Performance:** With entity cache disabled, [A] and [C1] hit DB
on every auth
+request. The new auth cache layer must protect these without relying on entity
store cache.
+([B] also hits DB, but this is correct and acceptable — see §1.5.)
+
+**Problem 2 — Consistency:** `loadedRoles` is TTL-bounded (1 hour staleness)
and updated by hook with in a instance. Permission
+changes must take effect at the next auth request, not after TTL expiry.
+
+Both problems are solved by the same mechanism: a version-validated cache for
the user's role
+list (userId comes for free from the same query).
+
+### 2.2 Requirements
+
+| Goal | Requirement
|
+|---------------------------------|---------------------------------------------------------------------------------------------------------------|
+| HA auth consistency | Privilege revocations visible on all nodes
at the next auth request |
+| Auth self-sufficiency | [A] and [C1] protected without relying on
entity store cache |
+| Auth performance | Hot path: ≤ 3 lightweight DB queries
|
+| No new mandatory infrastructure | Solution requires only the existing DB
|
+| Incremental delivery | Phase 1 independently shippable
|
+
+---
+
+## 3. Industry Reference
+
+### 3.1 Apache Polaris — Per-Entity Version Tracking
+
+#### Schema
+
+All entity types (catalogs, namespaces, tables, roles, principals) share a
single `ENTITIES`
+table (single-table inheritance). The two version columns are the key fields
for caching:
+
+```sql
+ENTITIES (
+ id BIGINT, -- Unique entity ID
+ catalog_id BIGINT, -- Owning catalog (0 for top-level entities)
+ parent_id BIGINT, -- Parent entity ID, forms the hierarchy
tree
+ type_code INT, -- Entity type enum (see hierarchy below)
+ name VARCHAR,
+ entity_version INT, -- Bumped on rename / property update /
drop ← key
+ sub_type_code INT, -- Subtype (ICEBERG_TABLE, ICEBERG_VIEW,
etc.)
+ properties JSON, -- User-visible properties (location,
format, etc.)
+ internal_properties JSON, -- Internal properties (credentials,
storage config, etc.)
+ grant_records_version INT, -- Bumped on every GRANT or REVOKE
← key
+)
+
+GRANT_RECORDS (
+ securable_catalog_id BIGINT,
+ securable_id BIGINT, -- The resource being secured
(table/namespace/catalog)
+ grantee_catalog_id BIGINT,
+ grantee_id BIGINT, -- The principal or role receiving the grant
+ privilege_code INT -- One of 102 defined privileges
+)
+```
+
+`GRANT_RECORDS` has no version column of its own. The version fingerprint is
stored in
+`ENTITIES.grant_records_version` — detecting staleness requires no scan of
`GRANT_RECORDS`.
+
+#### Entity Type Hierarchy
+
+```
+ROOT
+ ├── PRINCIPAL (user account, isGrantee)
+ ├── PRINCIPAL_ROLE (user-level role, isGrantee)
+ └── CATALOG
+ ├── CATALOG_ROLE (catalog-level role, isGrantee)
+ ├── NAMESPACE
+ │ └── TABLE_LIKE / POLICY / FILE
+ └── TASK
+```
+
+Only `PRINCIPAL`, `PRINCIPAL_ROLE`, and `CATALOG_ROLE` are **grantees** (can
receive grants).
+All others are **securables** (privileges are set on them).
+
+#### How `grantRecordsVersion` Is Maintained
+
+Every `grantPrivilege` / `revokePrivilege` call performs three writes in **one
DB transaction**:
+
+1. Insert or delete the `GRANT_RECORDS` row.
+2. Increment `grant_records_version` on the **grantee** entity row.
+3. Increment `grant_records_version` on the **securable** entity row.
+
+Both sides are bumped atomically — no separate changelog table is needed.
+
+#### Version-Validated Cache
+
+The cache unit is `ResolvedPolarisEntity` = entity metadata + grant records in
both directions.
+On every request, `bulkValidate()` issues one batch query for all path
entities:
+
+```sql
+SELECT * FROM ENTITIES WHERE (catalog_id, id) IN ((?, ?), ...)
+```
+
+| Path | Condition | Action
|
+|-------------------------|------------------------|----------------------------------------|
+| Cache hit | Both versions current | Serve from cache — **0
extra queries** |
+| Stale, targeted refresh | Either version behind | Reload only the changed
dimension |
+| Cache miss | Not in cache | Full load
|
+
+The DB is the single source of truth; no broadcast is needed for correctness.
+
+**Key difference from Gravitino:** Polaris bundles entity + grants in one
cached object, so one
+batch query covers both dimensions. Gravitino separates user→role from
role→privilege, requiring
+2 version-check queries on a warm hit (see §4.1 Step 1 and Step 3). Both
achieve strong
+consistency.
+
+### 3.2 Other References
+
+**Nessie** — HTTP fan-out invalidation: async POST to peer nodes on write,
convergence < 200 ms.
+
+**Keycloak** — JGroups embedded cluster messaging: in-JVM broadcast, no
separate service.
+Recommended future direction if Gravitino needs stronger delivery guarantees.
+
+**DB version polling** — monotonic counters incremented in write transaction;
a background
+thread polls for version changes and proactively invalidates caches.
Considered but not
+adopted; per-request validation (§4.1) achieves strong consistency without
background threads.
+
+---
+
+## 4. Design
+
+### 4.1 Per-Request Version Check (Polaris Style)
+
+Every auth request executes two lightweight version-check queries before
serving from cache.
+If any version has advanced, only the stale portion is reloaded. Staleness
window: **zero**.
+
+#### 4.1.1 Schema Changes
+
+Three new version columns, all `DEFAULT 1` — fully backward compatible.
Existing rows get
+version 1; first auth check after migration populates caches normally.
+
+```sql
+ALTER TABLE `role_meta`
+ ADD COLUMN `securable_objects_version` INT UNSIGNED NOT NULL DEFAULT 1
+ COMMENT 'Incremented atomically with any privilege grant/revoke for this
role';
+
+ALTER TABLE `user_meta`
+ ADD COLUMN `role_grants_version` INT UNSIGNED NOT NULL DEFAULT 1
+ COMMENT 'Incremented atomically with any role assignment/revocation for
this user';
+
+ALTER TABLE `group_meta`
+ ADD COLUMN `role_grants_version` INT UNSIGNED NOT NULL DEFAULT 1
+ COMMENT 'Incremented atomically with any role assignment/revocation for
this group';
+```
+
+Write paths that must bump the version **in the same DB transaction**:
+
+| Operation | Column
| Location |
+|------------------------------------|---------------------------------------------------------|--------------------|
+| Grant / revoke privilege on role R | `role_meta.securable_objects_version
WHERE role_id = R` | `RoleMetaService` |
+| Assign / revoke role for user U | `user_meta.role_grants_version WHERE
user_id = U` | `UserMetaService` |
+| Assign / revoke role for group G | `group_meta.role_grants_version WHERE
group_id = G` | `GroupMetaService` |
+
+Version comparison uses `!=` (not `<`) to safely handle theoretical INT
wrap-around.
+
+**Ownership transfers** require no schema change and no cache. The `ownerRel`
cache is
+**removed** (see §7.2). Step 2.5 queries `owner_meta` directly with a single
batch query on
+every auth request — strongly consistent, no versioning complexity needed.
+
+#### 4.1.2 Cache Data Structures (Changes in JcasbinAuthorizer)
+
+```java
+// ─── BEFORE ──────────────────────────────────────────────────────────
+private Cache<Long, Boolean> loadedRoles; // roleId → loaded?
+private Cache<Long, Optional<Long>> ownerRel;
+
+// ─── AFTER ───────────────────────────────────────────────────────────
+
+// NEW: replaces entity cache dependency for [A] (userId) and [C1] (role list).
+// Step 1 query returns both user_id and role_grants_version in one shot.
+// metalakeName→metalakeId resolved inline via JOIN — no dedicated cache
needed.
+private GravitinoCache<String, CachedUserRoles> userRoleCache;
+// key = metalakeName + ":" + userName
+
+record CachedUserRoles(
+ long userId, // integer userId for JCasbin enforce()
+ int roleGrantsVersion, // user_meta.role_grants_version at load time
+ List<Long> roleIds // role ID list at load time
+) {}
+
+// NEW: mirrors userRoleCache for groups (group can also hold role
assignments).
+private GravitinoCache<String, CachedGroupRoles> groupRoleCache;
+// key = metalakeName + ":" + groupName
+
+record CachedGroupRoles(
+ long groupId,
+ int roleGrantsVersion, // group_meta.role_grants_version at load
time
+ List<Long> roleIds
+) {}
+
+// TYPE CHANGE: was Cache<Long, Boolean>, now stores securable_objects_version.
+// Enables version-based staleness detection rather than TTL expiry.
+private GravitinoCache<Long, Integer> loadedRoles;
+// roleId → securable_objects_version at the time JCasbin policies were loaded
+
+// REMOVED: ownerRel cache eliminated (see §7.2).
+// isOwner() is called 2–4 times per request, but version-validated caching
offers no
+// query savings: the version check query already returns the owner_id, so
there is nothing
+// expensive to avoid. Step 2.5 queries owner_meta directly via one batch
query per request.
+// private Cache<Long, Optional<Long>> ownerRel;
+```
+
+**Why no cache for [B] (target resource name→ID):**
+Adding a `metadataIdCache` would require invalidation on every entity rename,
drop, or
+recreate across all entity types. Since JCasbin uses integer IDs (not names),
the DB lookup
+for [B] is always correct (~1 ms indexed). Simpler and more correct to hit DB
every request.
+
+**Why `ownerRel` is removed:**
+`isOwner()` is called 2–4 times per auth request (once per ancestor in the
OGNL expression
+chain). A version-validated cache would still query `owner_meta` on every
request to check
+versions — and that same query already returns the `owner_id`. There is no
expensive
+downstream work to avoid (unlike roles, where skipping securable-object
loading saves
+significant work). Caching adds complexity with zero query savings. Step 2.5
issues one
+batch query per request directly against `owner_meta`, achieving strong
consistency with
+no extra infrastructure. See §7.2.
+
+#### 4.1.3 Auth Check Flow
+
+```
+authorize(metalakeName, username, resource, operation)
+│
+├─ STEP 1 — User + Group version check (2 queries, metalake resolved via JOIN):
+│
+│ [1a] User query:
+│ SELECT um.user_id, um.role_grants_version
+│ FROM user_meta um
+│ JOIN metalake_meta mm ON um.metalake_id = mm.metalake_id AND mm.deleted_at
= 0
+│ WHERE mm.metalake_name = ? AND um.user_name = ? AND um.deleted_at = 0
+│
+│ userRoleCache HIT and role_grants_version matches:
+│ → use cached userId and roleIds [A] and [C1] avoided
+│ MISS or version mismatch:
+│ → SELECT role_id FROM user_role_rel WHERE user_id = ? AND deleted_at = 0
+│ → re-associate userId ↔ roleIds in JCasbin allow/deny enforcers
+│ → userRoleCache.put(key, CachedUserRoles(userId, version, roleIds))
+│
+│ [1b] Group query (user may belong to groups that also hold roles):
+│ SELECT gm.group_id, gm.role_grants_version
+│ FROM group_meta gm
+│ JOIN group_user_rel gu ON gm.group_id = gu.group_id AND gu.deleted_at = 0
+│ WHERE gu.user_id = ? AND gm.deleted_at = 0
+│
+│ For each group:
+│ groupRoleCache HIT and role_grants_version matches:
+│ → use cached groupId and roleIds [group C1] avoided
+│ MISS or version mismatch:
+│ → SELECT role_id FROM group_role_rel WHERE group_id = ? AND deleted_at
= 0
+│ → addRoleForUser(userId, roleId) in JCasbin enforcers
+│ → groupRoleCache.put(groupKey, CachedGroupRoles(groupId, version,
roleIds))
+│
+│ Note: current code only loads user-direct roles (ROLE_USER_REL). Loading
group roles
+│ via [1b] is a NEW capability introduced in Phase 2 alongside
groupRoleCache.
+│
+├─ STEP 2 — Resolve target resource ID (always DB, no cache):
+│
+│ metadataId = MetadataIdConverter.getID(resource, metalake) ← 1 indexed DB
query
+│ Always correct: rename does not change ID; drop+recreate returns the new
ID.
+│
+│ TODO: A strong-consistency name→id cache could eliminate this DB query on
the warm
+│ path. Version-based validation does not apply here (checking the version
requires
+│ the same query that returns the ID). A viable approach would require an
+│ entity_mutation_log for cross-node invalidation plus write-path eviction
on the
+│ local node. Not implemented in this phase.
+│
+├─ STEP 2.5 — Ownership check per isOwner() call (request-level dedup via
context cache):
+│ (Triggered by ANY(OWNER, …) in the OGNL expression — 2–4 calls per request.
+│ Note: OGNL evaluates lazily with short-circuit; ancestor IDs are not
pre-collected.)
+│
+│ Each isOwner(principal, metalake, metadataObject) call:
+│ requestContext.ownerCache HIT for metadataId → return cached result
immediately
+│ MISS:
+│ SELECT owner_id FROM owner_meta
+│ WHERE metadata_object_id = ? AND deleted_at = 0 ← 1 indexed query
+│ requestContext.ownerCache.put(metadataId, ownerId)
+│ compare ownerId with userId → return result
+│
+│ requestContext.ownerCache is a Map<Long, Optional<Long>> scoped to this
HTTP request.
+│ Within one request the same metadataId is never queried twice.
+│ Across requests: always queries DB → always strongly consistent.
+│
+│ Also fixes existing bug: isOwner() currently calls
MetadataIdConverter.getID() twice
+│ for the same object (JcasbinAuthorizer lines 224, 228). Phase 2
consolidates to 1 call.
+│
+├─ STEP 3 — Role batch version check (1 query):
+│
+│ SELECT role_id, securable_objects_version
+│ FROM role_meta WHERE role_id IN (?, ?, ...) AND deleted_at = 0
+│ ↑ one query validates all of the user's roles simultaneously
+│
+│ For each role where loadedRoles.get(roleId) == dbVersion:
+│ → policy current; skip [C2][C3] avoided
+│
+│ For stale/cold roles:
+│ → allowEnforcer.deleteRole(roleId); denyEnforcer.deleteRole(roleId)
+│ → batchListSecurableObjectsByRoleIds(staleRoleIds) (1 query for all
stale roles)
+│ → loadPoliciesForRoles(staleObjects)
+│ → loadedRoles.put(roleId, dbVersion)
+│
+└─ STEP 4 — enforce() (in-memory, O(1))
+ allowEnforcer.enforce(userId, objectType, metadataId, privilege)
+ denyEnforcer.enforce(userId, objectType, metadataId, privilege)
+```
+
+#### 4.1.4 Properties
+
+| Dimension | Value
|
+|--------------------------|-------------------------------------------------------------------------------------------------------|
+| Staleness window | **0** — every request validates against DB
|
+| Hot path DB queries | **4** (Step 1a user + Step 1b groups + Step 2
metadataId + Step 3 role versions) |
+| Owner check warm | **+0** — requestContext.ownerCache deduplicates
within one request |
+| Owner check cold | **+1 per unique ancestor metadataId** on first
access within the request |
+| Cold/stale path | **5–6** queries
|
+| Background threads | **None** — executor removed (N+1 fix uses batch
query instead of parallel futures) |
+| Failure mode | DB unavailable → auth blocked (same as today)
|
+| HA correctness | **Fixed** — Step 1 validates role versions; Step
2.5 always reads from DB |
+
+#### 4.1.5 Correctness Under Rename and Drop
+
+| Scenario | Analysis
|
+|-----------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| **User / Group rename** | `userRoleCache` is keyed on
`metalakeName:userName`. A rename produces a cache miss → Step 1 queries DB and
returns the correct result. The old key has no traffic and expires via TTL.
**Safe.** |
+| **User / Group drop** | Step 1 returns zero rows →
auth denied. The old cache entry expires harmlessly. **Safe.**
|
+| **User / Group drop + same-name recreate** | The new entity gets a new
auto-increment `user_id` and `role_grants_version = 1`. The cached entry holds
the old `user_id` and an older version → **version mismatch on the next Step 1
forces a cache refresh.** ✅ |
+| **SecurableObject rename** | JCasbin stores integer
`metadataId`. Rename does not change the ID. Step 2 resolves the new name to
the same ID via DB. `enforce()` matches the existing policy. **No action
needed.** ✅ |
+| **SecurableObject drop** | Step 2 returns "not found" →
auth denied. Orphan JCasbin policies remain in memory but can never be matched
(no ID resolves to the dropped object). **Safe.**
|
+| **SecurableObject drop + same-name recreate** | The new object gets a new
`metadataId`. No JCasbin policy covers it → DENY until a new privilege grant
bumps `securable_objects_version` in the same transaction and Step 3 detects
the version change to reload policies. **Correct.** |
+
+#### 4.1.6 Concurrent Mutation During Auth (TOCTOU)
+
+The version check in Step 1 and the policy reload in Step 3 are not atomic
with the
+`enforce()` call in Step 4. A concurrent write on another thread or node can
advance a
+version counter between these steps. This section analyses the bounded impact.
+
+**Scenario A — Role revoked between Step 1 read and Step 3 policy check**
+
+```
+Thread A Step 1: reads role_grants_version = 5 → matches cache → roleIds =
[R1, R2]
+Thread B commits: revokes R2 from user → role_grants_version bumped to 6
+Thread A Step 3: checks R1, R2 versions → both current (policies loaded) →
skips reload
+Thread A Step 4: enforce() sees R2 policy → may ALLOW using revoked role
+```
+
+Thread A's check reflects the DB state at the moment of Step 1. The revoke is
visible
+on Thread A's **next** request (Step 1 reads version 6 → mismatch → reloads
role list →
+R2 absent → JCasbin user-role mapping updated → DENY). The inconsistency
window is
+bounded to **the duration of one auth request** (typically single-digit
milliseconds).
+
+Strictly eliminating this window would require serializable isolation across
the entire
+auth flow, which is impractical at auth-request frequency. The window is
acceptable.
+
+**Scenario B — Privilege revoked during policy reload (Step 3)**
+
+```
+Thread A Step 3: detects R1 stale (version N → N+1) → calls deleteRole(R1),
reloads
+Thread B commits: another revoke on R1 → version bumped to N+2
+Thread A reads R1 securable objects from DB → may get version-N+1 snapshot
+Thread A stores loadedRoles(R1) = N+1
+Next req Step 3: db version = N+2, cache = N+1 → mismatch → reload again ✅
+```
+
+Thread A might serve one request using N+1 policies (missing the N+2 revoke).
The
+**next** request detects the mismatch and reloads. Bounded to one request
window.
+
+**Scenario C — Concurrent reload of the same role by two threads**
+
+Both threads call `deleteRole(R1)` then reload. `SyncedEnforcer` serialises
JCasbin
+mutations; the second reload overwrites the first with the same (or newer) DB
data.
+Final state is correct. No correctness issue.
+
+**Summary**
+
+| Race | Worst-case window
| Acceptability |
+|--------------------------------------------|-------------------------------------|--------------------------------------|
+| Revoke committed after Step 1 version read | Duration of 1 auth request
| Acceptable — next request is correct |
+| Revoke committed during Step 3 reload | Duration of 1 auth request
| Acceptable — next request is correct |
+| Concurrent reload of same role | None (serialised by
SyncedEnforcer) | No issue |
+
+The TOCTOU window is an inherent property of non-serializable reads in
distributed systems.
+It applies only during concurrent admin mutations (which are rare in
practice), not on
+the steady-state auth path.
+
+---
+
+## 5. Phased Implementation Plan
+
+### Phase 1 — Foundation (common to both approaches, no schema changes)
+
+| Step | Change
| Module |
+|-------|--------------------------------------------------------------------------------------|------------------------------------------------------|
+| 1.1 | Fix auth N+1: `batchListSecurableObjectsByRoleIds()` + rewrite
`loadRolePrivilege()` | `RoleMetaService`, `JcasbinAuthorizer` |
+| 1.2 | Introduce `GravitinoCache<K,V>` interface; wrap existing Caffeine
caches | `GravitinoCache.java`, `CaffeineGravitinoCache.java` |
+| 1.3 | Disable entity store cache: `CACHE_ENABLED` default → `false`
| `Configs.java` |
+| 1.4 | Fix DBCP2: `minEvictableIdleTimeMillis` 1 s → 30 s; `minIdle` 0 → 5
| `SqlSessionFactoryHelper.java` |
+
+**Outcome:** Auth cold path `3+T` queries. Calls [A][C1] hit DB every request
(acceptable
+as a stepping stone — Phase 2 closes this). Consistency still TTL-bounded.
+
+---
+
+### Phase 2 — Version-Validated Auth Cache Implementation
+
+| Step | Change
| Module
|
+|-------|---------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
+| 2.1 | `ADD COLUMN securable_objects_version` on `role_meta`
|
`schema-x.y.z-*.sql` |
+| 2.2 | `ADD COLUMN role_grants_version` on `user_meta`
|
`schema-x.y.z-*.sql` |
+| 2.3 | `ADD COLUMN role_grants_version` on `group_meta`
|
`schema-x.y.z-*.sql` |
+| 2.4 | Bump `securable_objects_version` in privilege grant/revoke
transaction |
`RoleMetaService` |
+| 2.5 | Bump `role_grants_version` in role assign/revoke transaction (user +
group) | `UserMetaService`,
`GroupMetaService` |
+| 2.6 | Add `userRoleCache: GravitinoCache<String, CachedUserRoles>`
|
`JcasbinAuthorizer` |
+| 2.7 | Add `groupRoleCache: GravitinoCache<String, CachedGroupRoles>`;
implement group role loading in `loadRolePrivilege()` (currently missing
entirely) | `JcasbinAuthorizer` |
+| 2.8 | Change `loadedRoles` type: `Boolean` → `Integer` (stores version)
|
`JcasbinAuthorizer` |
+| 2.9 | Rewrite `loadRolePrivilege()` + `authorize()` with 4-step flow
(§4.1.3); remove `executor` thread pool (replaced by batch query) |
`JcasbinAuthorizer` |
+| 2.10 | Add mapper methods (see §6.1)
| mapper + SQL
|
+| 2.11 | Remove `ownerRel`; add `ownerCache: Map<Long, Optional<Long>>` to
`AuthorizationRequestContext` for per-request dedup; fix double `getID()` call
in `isOwner()` | `JcasbinAuthorizer`, `AuthorizationRequestContext` |
+| 2.12 | Extend `AuthorizationRequestContext.loadRole()` guard to cover both
user and group role loading in one pass |
`AuthorizationRequestContext` |
+
+**Outcome:** Zero staleness. Hot path: 4 DB queries (Step 1a user + Step 1b
groups + Step 2 metadataId + Step 3 role versions). `isOwner()` deduped within
request via context cache; always consistent.
+
+---
Review Comment:
Added tasks
##########
design-docs/cache-improvement-design.md:
##########
@@ -0,0 +1,873 @@
+# Gravitino Cache Improvement Design
+
+---
+
+## 1. Background
+
+### 1.1 System Overview
+
+Gravitino is a unified metadata management control plane. Compute engines
(Spark, Flink, Trino)
+call it during query planning to resolve catalog, schema, and table metadata,
and to verify
+user permissions. The access pattern is distinctly **read-heavy,
write-light**: DDL operations
+are infrequent, and metadata is resolved once per job.
+
+Gravitino is evolving from single-node to multi-node active-active HA
deployment. Each node
+currently maintains its own independent in-process Caffeine cache with no
cross-node
+synchronisation. Under HA, any write on one node leaves other nodes' caches
stale until TTL
+expiry.
+
+---
+
+### 1.2 Current Cache Architecture Overview
+
+Gravitino maintains three distinct caching layers for the authorization path:
+
+```
+┌──────────────────────────────────────────────────────┐
+│ Layer 3: Per-request cache (AuthorizationRequestContext) │
+│ Scope: one HTTP request; prevents duplicate auth calls │
+├──────────────────────────────────────────────────────┤
+│ Layer 2: Auth policy caches (JcasbinAuthorizer) │
+│ loadedRoles Cache<Long, Boolean> hook update/TTL │
+│ ownerRel Cache<Long, Optional<Long>> hook update/TTL │
+├──────────────────────────────────────────────────────┤
+│ Layer 1: Entity store cache (RelationalEntityStore) │
+│ CaffeineEntityCache — or NoOpsCache when disabled │
+│ Caches entity reads and relation queries for all modules │
+│ Controlled by Configs.CACHE_ENABLED │
+└──────────────────────────────────────────────────────┘
+```
+
+**JCasbin is the core of the auth cache system.** It maintains an in-memory
policy table:
+
+```
+(roleId, objectType, metadataId, privilege) → ALLOW | DENY
+```
+
+The Layer 2 caches exist solely to manage JCasbin's policy loading lifecycle:
+
+| Cache | Role
|
+|-----------------------------------------|----------------------------------------------------------------------------------------------------------------|
+| `loadedRoles: Cache<Long, Boolean>` | Tracks which roles are already
loaded into JCasbin — prevents repeated [C2]+[C3] queries on every auth request
|
+| `ownerRel: Cache<Long, Optional<Long>>` | Caches owner lookups — **prevents
[D1] on every auth request** (2–4 `isOwner()` calls per request, see §1.3.2) |
+
+Without `loadedRoles`, every auth request would re-execute N DB queries to
reload all of a
+user's role policies into JCasbin. These two caches are the reason the auth
path is fast on
+the warm path. Layer 1 (entity cache) additionally accelerates the name→ID
resolution calls
+([A], [B], [C1]) that feed into JCasbin's enforce call.
+
+---
+
+#### 1.2.1 Problems with the Current Entity Cache
+
+**The entity cache (Layer 1) has accumulated significant complexity and is not
well-suited to
+serve as a general-purpose or auth-dedicated caching layer.**
+
+##### Mixed responsibilities make it hard to maintain
+
+`CaffeineEntityCache` uses a single `Cache<EntityCacheRelationKey,
List<Entity>>` to store
+three semantically different kinds of data:
+
+| Stored data | Key form |
Example relation types |
+|-------------------------|--------------------------------------------------|-----------------------------------------------------------|
+| Direct entity | `(nameIdentifier, entityType, null)` |
any entity: catalog, schema, table, user, role, ... |
+| Relation result set | `(nameIdentifier, entityType, relType)` |
`ROLE_USER_REL`, `TAG_METADATA_OBJECT_REL`, ... |
+| Reverse index entries | `ReverseIndexCache` (separate radix tree) |
entity → list of cache keys that reference it |
+
+On top of this, a `cacheIndex` (radix tree) keeps a prefix-indexed view of all
keys to
+support cascading invalidation. The resulting invalidation logic
(`invalidateEntities`) is a
+BFS traversal that walks both the forward index and the reverse index, making
it difficult to
+reason about correctness and hard to extend safely.
+
+The five relation types currently tracked (`METADATA_OBJECT_ROLE_REL`,
`ROLE_USER_REL`,
+`ROLE_GROUP_REL`, `POLICY_METADATA_OBJECT_REL`, `TAG_METADATA_OBJECT_REL`) are
all
+auth-related, which reflects the original design intent: **the entity cache
was built
+primarily to serve the auth path.** Over time it accumulated relation types
and reverse-index
+logic without a clear ownership model, making it harder to maintain and evolve.
+
+##### Limited benefit for non-auth interfaces
+
+For general metadata API calls (list catalogs, list schemas, list tables), the
entity cache
+provides minimal benefit:
+
+| Operation | Goes through cache? | Notes
|
+|------------------------------------|---------------------|---------------------------------------------------|
+| `list(namespace, type)` | **No** | Bypasses cache
entirely; always hits DB |
+| `get(ident, type)` (single entity) | Yes | Cache helps on
repeated reads of the same entity |
+| `update(ident, type)` | Invalidate only | Invalidates
entry, write always goes to DB |
+| `listEntitiesByRelation(...)` | Yes | Only for the five
auth-centric relation types |
+
+In practice, the most common metadata browsing operations (`LIST` endpoints)
are not cached
+at the entity store level. The cache's real workload is the auth path, where
the same user
+entity, role assignments, and resource IDs are resolved on every single
authorization check.
+
+**Conclusion:** The entity cache is a de-facto auth cache dressed up as a
general-purpose
+cache. Its complexity is unjustified for the non-auth use case, and its
TTL-based consistency
+model is insufficient for the auth use case (see §1.8). A purpose-built auth
cache layer —
+separate from the entity store — is the cleaner path forward.
+
+---
+
+### 1.3 JCasbin Authorization — Deep Dive
+
+#### 1.3.1 Call Graph for a Single `authorize()` Check
+
+```
+JcasbinAuthorizer.authorize(principal, metalake, metadataObject, privilege)
+│
+├─ [A] getUserEntity(username, metalake)
+│ entityStore.get(USER by NameIdentifier)
+│ → Needed to obtain integer userId for JCasbin enforce()
+│
+├─ [B] MetadataIdConverter.getID(metadataObject, metalake) ← TARGET
RESOURCE
+│ entityStore.get(entity by NameIdentifier)
+│ → Needed to get integer metadataId for JCasbin enforce()
+│ → Called on every auth request
+│
+├─ [C] loadRolePrivilege(metalake, username, userId, requestContext)
+│ │ (guarded by requestContext.hasLoadRole — runs once per HTTP request)
+│ │
+│ ├─ [C1] entityStore.listEntitiesByRelation(ROLE_USER_REL, userIdentifier)
+│ │ → Get all roles assigned to this user
+│ │
+│ └─ For each role NOT already in loadedRoles cache:
+│ ├─ [C2] entityStore.get(RoleEntity by name) ← async, thread pool
+│ └─ loadPolicyByRoleEntity(roleEntity)
+│ └─ For each securableObject in role.securableObjects():
+│ ├─ [C3] MetadataIdConverter.getID(securableObject, metalake)
+│ └─ enforcer.addPolicy(roleId, objType, metadataId, privilege,
effect)
+│
+│ loadedRoles.put(roleId, true) ← mark role as loaded
+│
+├─ [D] isOwner() / loadOwnerPolicy(...) ← called on EVERY auth request (not
only OWNER
+│ │ privilege checks). Nearly all auth expressions contain ANY(OWNER,
METALAKE, CATALOG),
+│ │ which expands to METALAKE::OWNER || CATALOG::OWNER || … and calls
isOwner() directly
+│ │ via OGNL, independently of the authorize() path. Typical call count:
2–4 per request.
+│ ├─ Check ownerRel cache → if HIT, return (most non-owner users get
Optional.empty())
+│ └─ [D1] entityStore.listEntitiesByRelation(OWNER_REL, ...)
+│ ownerRel.put(metadataId, Optional.of(ownerId))
+│
+└─ [E] enforcer.enforce(userId, objectType, metadataId, privilege) ←
in-memory, O(1)
+```
+
+#### 1.3.2 What Each Cache Protects
+
+`loadedRoles: Cache<Long, Boolean>` — answers "is this role's policy already
in JCasbin?"
+Without it, every request re-executes [C2]+[C3] for all roles the user has
(N+1 queries).
+With it, [C2]+[C3] only run on first load per role. **This is the most
critical cache.**
+
+`ownerRel: Cache<Long, Optional<Long>>` — caches ownership lookups for
OWNER-privilege
+checks. **Contrary to initial analysis, `ownerRel` is consulted on virtually
every auth
+request**, not only when `privilege == OWNER`. The reason is that nearly every
authorization
+expression in `AuthorizationExpressionConstants` includes `ANY(OWNER,
METALAKE, CATALOG)`
+or similar clauses (e.g. `LOAD_TABLE_AUTHORIZATION_EXPRESSION`,
+`FILTER_TABLE_AUTHORIZATION_EXPRESSION`,
`LOAD_CATALOG_AUTHORIZATION_EXPRESSION`). The
+`ANY(OWNER, …)` macro expands to `METALAKE::OWNER || CATALOG::OWNER || …`, and
each
+`X::OWNER` term calls `isOwner()` directly — a code path that is **independent
of
+`authorize()`**. As a result, every auth request triggers 2–4 `isOwner()`
calls (one per
+ancestor level), each consulting `ownerRel`. For most non-owner users,
`ownerRel` caches
+`Optional.empty()`, which lets the ownership sub-check fail quickly without a
DB query.
+Without `ownerRel`, every auth request would add 2–4 extra DB queries against
`owner_meta`.
+
+**What these caches do NOT protect** (hit DB on every auth request without
entity cache):
+
+| Call | Description
| Protected by |
+|----------------------------------------------|-------------------------------------------|-------------------|
+| [A] `getUserEntity()` | Fetch User entity → get
integer userId | Entity cache only |
+| [B] `MetadataIdConverter.getID()` target | Resolve target resource name
→ integer ID | Entity cache only |
+| [C1] `listEntitiesByRelation(ROLE_USER_REL)` | Get user's role list
| Entity cache only |
+
+---
+
+### 1.4 Impact of Disabling Entity Cache
+
+Layer 2 sits **on top of** Layer 1. When Layer 1 is disabled (NoOpsCache),
calls [A], [B],
+[C1] hit DB on every auth request.
+
+| Call | With entity cache
| Without entity cache |
+|--------------------------------------------------|-------------------------------|---------------------------------|
+| [A] `getUserEntity()` | Cache hit after first
request | **DB query every auth request** |
+| [B] `MetadataIdConverter.getID()` target | Cache hit after first
request | **DB query every auth request** |
+| [C1] `listEntitiesByRelation(ROLE_USER_REL)` | Cache hit after first
request | **DB query every auth request** |
+| [C2] `entityStore.get(RoleEntity)` | Protected by
`loadedRoles` | DB only on cold role load |
+| [C3] `MetadataIdConverter.getID()` per privilege | Protected by
`loadedRoles` | DB only on cold role load |
+| [D1] `listEntitiesByRelation(OWNER_REL)` | Protected by `ownerRel`
| **DB query 2–4x per request** |
+
+---
+
+
+## 2. Goals
+
+### 2.1 The Two Problems to Solve
+
+**Problem 1 — Performance:** With entity cache disabled, [A] and [C1] hit DB
on every auth
+request. The new auth cache layer must protect these without relying on entity
store cache.
+([B] also hits DB, but this is correct and acceptable — see §1.5.)
+
+**Problem 2 — Consistency:** `loadedRoles` is TTL-bounded (1 hour staleness)
and updated by hook with in a instance. Permission
+changes must take effect at the next auth request, not after TTL expiry.
+
+Both problems are solved by the same mechanism: a version-validated cache for
the user's role
+list (userId comes for free from the same query).
+
+### 2.2 Requirements
+
+| Goal | Requirement
|
+|---------------------------------|---------------------------------------------------------------------------------------------------------------|
+| HA auth consistency | Privilege revocations visible on all nodes
at the next auth request |
+| Auth self-sufficiency | [A] and [C1] protected without relying on
entity store cache |
+| Auth performance | Hot path: ≤ 3 lightweight DB queries
|
+| No new mandatory infrastructure | Solution requires only the existing DB
|
+| Incremental delivery | Phase 1 independently shippable
|
+
+---
+
+## 3. Industry Reference
+
+### 3.1 Apache Polaris — Per-Entity Version Tracking
+
+#### Schema
+
+All entity types (catalogs, namespaces, tables, roles, principals) share a
single `ENTITIES`
+table (single-table inheritance). The two version columns are the key fields
for caching:
+
+```sql
+ENTITIES (
+ id BIGINT, -- Unique entity ID
+ catalog_id BIGINT, -- Owning catalog (0 for top-level entities)
+ parent_id BIGINT, -- Parent entity ID, forms the hierarchy
tree
+ type_code INT, -- Entity type enum (see hierarchy below)
+ name VARCHAR,
+ entity_version INT, -- Bumped on rename / property update /
drop ← key
+ sub_type_code INT, -- Subtype (ICEBERG_TABLE, ICEBERG_VIEW,
etc.)
+ properties JSON, -- User-visible properties (location,
format, etc.)
+ internal_properties JSON, -- Internal properties (credentials,
storage config, etc.)
+ grant_records_version INT, -- Bumped on every GRANT or REVOKE
← key
+)
+
+GRANT_RECORDS (
+ securable_catalog_id BIGINT,
+ securable_id BIGINT, -- The resource being secured
(table/namespace/catalog)
+ grantee_catalog_id BIGINT,
+ grantee_id BIGINT, -- The principal or role receiving the grant
+ privilege_code INT -- One of 102 defined privileges
+)
+```
+
+`GRANT_RECORDS` has no version column of its own. The version fingerprint is
stored in
+`ENTITIES.grant_records_version` — detecting staleness requires no scan of
`GRANT_RECORDS`.
+
+#### Entity Type Hierarchy
+
+```
+ROOT
+ ├── PRINCIPAL (user account, isGrantee)
+ ├── PRINCIPAL_ROLE (user-level role, isGrantee)
+ └── CATALOG
+ ├── CATALOG_ROLE (catalog-level role, isGrantee)
+ ├── NAMESPACE
+ │ └── TABLE_LIKE / POLICY / FILE
+ └── TASK
+```
+
+Only `PRINCIPAL`, `PRINCIPAL_ROLE`, and `CATALOG_ROLE` are **grantees** (can
receive grants).
+All others are **securables** (privileges are set on them).
+
+#### How `grantRecordsVersion` Is Maintained
+
+Every `grantPrivilege` / `revokePrivilege` call performs three writes in **one
DB transaction**:
+
+1. Insert or delete the `GRANT_RECORDS` row.
+2. Increment `grant_records_version` on the **grantee** entity row.
+3. Increment `grant_records_version` on the **securable** entity row.
+
+Both sides are bumped atomically — no separate changelog table is needed.
+
+#### Version-Validated Cache
+
+The cache unit is `ResolvedPolarisEntity` = entity metadata + grant records in
both directions.
+On every request, `bulkValidate()` issues one batch query for all path
entities:
+
+```sql
+SELECT * FROM ENTITIES WHERE (catalog_id, id) IN ((?, ?), ...)
+```
+
+| Path | Condition | Action
|
+|-------------------------|------------------------|----------------------------------------|
+| Cache hit | Both versions current | Serve from cache — **0
extra queries** |
+| Stale, targeted refresh | Either version behind | Reload only the changed
dimension |
+| Cache miss | Not in cache | Full load
|
+
+The DB is the single source of truth; no broadcast is needed for correctness.
+
+**Key difference from Gravitino:** Polaris bundles entity + grants in one
cached object, so one
+batch query covers both dimensions. Gravitino separates user→role from
role→privilege, requiring
+2 version-check queries on a warm hit (see §4.1 Step 1 and Step 3). Both
achieve strong
+consistency.
+
+### 3.2 Other References
+
+**Nessie** — HTTP fan-out invalidation: async POST to peer nodes on write,
convergence < 200 ms.
+
+**Keycloak** — JGroups embedded cluster messaging: in-JVM broadcast, no
separate service.
+Recommended future direction if Gravitino needs stronger delivery guarantees.
+
+**DB version polling** — monotonic counters incremented in write transaction;
a background
+thread polls for version changes and proactively invalidates caches.
Considered but not
+adopted; per-request validation (§4.1) achieves strong consistency without
background threads.
+
+---
+
+## 4. Design
+
+### 4.1 Per-Request Version Check (Polaris Style)
+
+Every auth request executes two lightweight version-check queries before
serving from cache.
+If any version has advanced, only the stale portion is reloaded. Staleness
window: **zero**.
+
+#### 4.1.1 Schema Changes
+
+Three new version columns, all `DEFAULT 1` — fully backward compatible.
Existing rows get
+version 1; first auth check after migration populates caches normally.
+
+```sql
+ALTER TABLE `role_meta`
+ ADD COLUMN `securable_objects_version` INT UNSIGNED NOT NULL DEFAULT 1
+ COMMENT 'Incremented atomically with any privilege grant/revoke for this
role';
+
+ALTER TABLE `user_meta`
+ ADD COLUMN `role_grants_version` INT UNSIGNED NOT NULL DEFAULT 1
+ COMMENT 'Incremented atomically with any role assignment/revocation for
this user';
+
+ALTER TABLE `group_meta`
+ ADD COLUMN `role_grants_version` INT UNSIGNED NOT NULL DEFAULT 1
+ COMMENT 'Incremented atomically with any role assignment/revocation for
this group';
+```
+
+Write paths that must bump the version **in the same DB transaction**:
+
+| Operation | Column
| Location |
+|------------------------------------|---------------------------------------------------------|--------------------|
+| Grant / revoke privilege on role R | `role_meta.securable_objects_version
WHERE role_id = R` | `RoleMetaService` |
+| Assign / revoke role for user U | `user_meta.role_grants_version WHERE
user_id = U` | `UserMetaService` |
+| Assign / revoke role for group G | `group_meta.role_grants_version WHERE
group_id = G` | `GroupMetaService` |
+
+Version comparison uses `!=` (not `<`) to safely handle theoretical INT
wrap-around.
+
+**Ownership transfers** require no schema change and no cache. The `ownerRel`
cache is
+**removed** (see §7.2). Step 2.5 queries `owner_meta` directly with a single
batch query on
+every auth request — strongly consistent, no versioning complexity needed.
+
+#### 4.1.2 Cache Data Structures (Changes in JcasbinAuthorizer)
+
+```java
+// ─── BEFORE ──────────────────────────────────────────────────────────
+private Cache<Long, Boolean> loadedRoles; // roleId → loaded?
+private Cache<Long, Optional<Long>> ownerRel;
+
+// ─── AFTER ───────────────────────────────────────────────────────────
+
+// NEW: replaces entity cache dependency for [A] (userId) and [C1] (role list).
+// Step 1 query returns both user_id and role_grants_version in one shot.
+// metalakeName→metalakeId resolved inline via JOIN — no dedicated cache
needed.
+private GravitinoCache<String, CachedUserRoles> userRoleCache;
+// key = metalakeName + ":" + userName
+
+record CachedUserRoles(
+ long userId, // integer userId for JCasbin enforce()
+ int roleGrantsVersion, // user_meta.role_grants_version at load time
+ List<Long> roleIds // role ID list at load time
+) {}
+
+// NEW: mirrors userRoleCache for groups (group can also hold role
assignments).
+private GravitinoCache<String, CachedGroupRoles> groupRoleCache;
+// key = metalakeName + ":" + groupName
+
+record CachedGroupRoles(
+ long groupId,
+ int roleGrantsVersion, // group_meta.role_grants_version at load
time
+ List<Long> roleIds
+) {}
+
+// TYPE CHANGE: was Cache<Long, Boolean>, now stores securable_objects_version.
+// Enables version-based staleness detection rather than TTL expiry.
+private GravitinoCache<Long, Integer> loadedRoles;
+// roleId → securable_objects_version at the time JCasbin policies were loaded
+
+// REMOVED: ownerRel cache eliminated (see §7.2).
+// isOwner() is called 2–4 times per request, but version-validated caching
offers no
+// query savings: the version check query already returns the owner_id, so
there is nothing
+// expensive to avoid. Step 2.5 queries owner_meta directly via one batch
query per request.
+// private Cache<Long, Optional<Long>> ownerRel;
+```
+
+**Why no cache for [B] (target resource name→ID):**
+Adding a `metadataIdCache` would require invalidation on every entity rename,
drop, or
+recreate across all entity types. Since JCasbin uses integer IDs (not names),
the DB lookup
+for [B] is always correct (~1 ms indexed). Simpler and more correct to hit DB
every request.
+
+**Why `ownerRel` is removed:**
+`isOwner()` is called 2–4 times per auth request (once per ancestor in the
OGNL expression
+chain). A version-validated cache would still query `owner_meta` on every
request to check
+versions — and that same query already returns the `owner_id`. There is no
expensive
+downstream work to avoid (unlike roles, where skipping securable-object
loading saves
+significant work). Caching adds complexity with zero query savings. Step 2.5
issues one
+batch query per request directly against `owner_meta`, achieving strong
consistency with
+no extra infrastructure. See §7.2.
+
+#### 4.1.3 Auth Check Flow
+
+```
+authorize(metalakeName, username, resource, operation)
+│
+├─ STEP 1 — User + Group version check (2 queries, metalake resolved via JOIN):
+│
+│ [1a] User query:
+│ SELECT um.user_id, um.role_grants_version
+│ FROM user_meta um
+│ JOIN metalake_meta mm ON um.metalake_id = mm.metalake_id AND mm.deleted_at
= 0
+│ WHERE mm.metalake_name = ? AND um.user_name = ? AND um.deleted_at = 0
+│
+│ userRoleCache HIT and role_grants_version matches:
+│ → use cached userId and roleIds [A] and [C1] avoided
+│ MISS or version mismatch:
+│ → SELECT role_id FROM user_role_rel WHERE user_id = ? AND deleted_at = 0
+│ → re-associate userId ↔ roleIds in JCasbin allow/deny enforcers
+│ → userRoleCache.put(key, CachedUserRoles(userId, version, roleIds))
+│
+│ [1b] Group query (user may belong to groups that also hold roles):
+│ SELECT gm.group_id, gm.role_grants_version
+│ FROM group_meta gm
+│ JOIN group_user_rel gu ON gm.group_id = gu.group_id AND gu.deleted_at = 0
+│ WHERE gu.user_id = ? AND gm.deleted_at = 0
+│
+│ For each group:
+│ groupRoleCache HIT and role_grants_version matches:
+│ → use cached groupId and roleIds [group C1] avoided
+│ MISS or version mismatch:
+│ → SELECT role_id FROM group_role_rel WHERE group_id = ? AND deleted_at
= 0
+│ → addRoleForUser(userId, roleId) in JCasbin enforcers
+│ → groupRoleCache.put(groupKey, CachedGroupRoles(groupId, version,
roleIds))
+│
+│ Note: current code only loads user-direct roles (ROLE_USER_REL). Loading
group roles
+│ via [1b] is a NEW capability introduced in Phase 2 alongside
groupRoleCache.
+│
+├─ STEP 2 — Resolve target resource ID (always DB, no cache):
+│
+│ metadataId = MetadataIdConverter.getID(resource, metalake) ← 1 indexed DB
query
+│ Always correct: rename does not change ID; drop+recreate returns the new
ID.
+│
+│ TODO: A strong-consistency name→id cache could eliminate this DB query on
the warm
+│ path. Version-based validation does not apply here (checking the version
requires
+│ the same query that returns the ID). A viable approach would require an
+│ entity_mutation_log for cross-node invalidation plus write-path eviction
on the
+│ local node. Not implemented in this phase.
Review Comment:
I see. The id-name mapping and owner cache have been updated:
An `update_at` column has been added to the table `owner_meta` to mark
whether there are changes in the owner relationship. At the same time, a new
table `entity_change_log` was created to indicate any updates to metadata
objects.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]