(gravitino) branch main updated: [#10695] feat(cache): Introduce Gravitino Cache Improvement Design document outlining new caching strategies and architecture (#10696)

jshao Mon, 27 Apr 2026 04:23:49 -0700

This is an automated email from the ASF dual-hosted git repository.

jerryshao pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/gravitino.git



The following commit(s) were added to refs/heads/main by this push:
     new 5ac8593a99 [#10695] feat(cache): Introduce Gravitino Cache Improvement 
Design document outlining new caching strategies and architecture (#10696)
5ac8593a99 is described below

commit 5ac8593a99b53e01a3e00093ed982c347fbdcaff
Author: Qi Yu <[email protected]>
AuthorDate: Mon Apr 27 19:23:36 2026 +0800

    [#10695] feat(cache): Introduce Gravitino Cache Improvement Design document 
outlining new caching strategies and architecture (#10696)
    
    ### What changes were proposed in this pull request?
    
    Raise a document about cache improvement.
    
    ### Why are the changes needed?
    
    It's a feature.
    
    Fix: #10695
    
    ### Does this PR introduce _any_ user-facing change?
    
    N/A
    
    ### How was this patch tested?
    
    ---------
    
    Co-authored-by: Jerry Shao <[email protected]>
---
 design-docs/cache-improvement-design.md | 1440 +++++++++++++++++++++++++++++++
 1 file changed, 1440 insertions(+)

diff --git a/design-docs/cache-improvement-design.md 
b/design-docs/cache-improvement-design.md
new file mode 100644
index 0000000000..750b04e923
--- /dev/null
+++ b/design-docs/cache-improvement-design.md
@@ -0,0 +1,1440 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Gravitino Cache Improvement Design
+
+---
+
+## 1. Background
+
+### 1.1 System Overview
+
+Gravitino is a unified metadata management control plane. Compute engines 
(Spark, Flink, Trino)
+call it during query planning to resolve catalog, schema, and table metadata, 
and to verify
+user permissions. The access pattern is distinctly **read-heavy, 
write-light**: DDL operations
+are infrequent, and metadata is resolved once per job.
+
+Gravitino is evolving from single-node to multi-node active-active HA 
deployment. Each node
+currently maintains its own independent in-process Caffeine cache with no 
cross-node
+synchronisation. Under HA, any write on one node leaves other nodes' caches 
stale until TTL
+expiry.
+
+---
+
+### 1.2 Current Cache Architecture Overview
+
+Gravitino maintains three distinct caching layers for the authorization path:
+
+```
+┌──────────────────────────────────────────────────────┐
+│  Layer 3: Per-request cache (AuthorizationRequestContext) │
+│  Scope: one HTTP request; prevents duplicate auth calls   │
+├──────────────────────────────────────────────────────┤
+│  Layer 2: Auth policy caches (JcasbinAuthorizer)          │
+│  loadedRoles  Cache<Long, Boolean>   hook update/TTL      │
+│  ownerRel     Cache<Long, Optional<Long>> hook update/TTL │
+├──────────────────────────────────────────────────────┤
+│  Layer 1: Entity store cache (RelationalEntityStore)      │
+│  CaffeineEntityCache — or NoOpsCache when disabled        │
+│  Caches entity reads and relation queries for all modules │
+│  Controlled by Configs.CACHE_ENABLED                      │
+└──────────────────────────────────────────────────────┘
+```
+
+**JCasbin is the core of the auth cache system.** It maintains an in-memory 
policy table:
+
+```
+(roleId, objectType, metadataId, privilege) → ALLOW | DENY
+```
+
+The Layer 2 caches exist solely to manage JCasbin's policy loading lifecycle:
+
+| Cache                                   | Role                               
                                                                            |
+|-----------------------------------------|----------------------------------------------------------------------------------------------------------------|
+| `loadedRoles: Cache<Long, Boolean>`     | Tracks which roles are already 
loaded into JCasbin — prevents repeated [C2]+[C3] queries on every auth request 
|
+| `ownerRel: Cache<Long, Optional<Long>>` | Caches owner lookups — **prevents 
[D1] on every auth request** (2–4 `isOwner()` calls per request, see §1.3.2) |
+
+Without `loadedRoles`, every auth request would re-execute N DB queries to 
reload all of a
+user's role policies into JCasbin. These two caches are the reason the auth 
path is fast on
+the warm path. Layer 1 (entity cache) additionally accelerates the name→ID 
resolution calls
+([A], [B], [C1]) that feed into JCasbin's enforce call.
+
+---
+
+#### 1.2.1 Problems with the Current Entity Cache
+
+**The entity cache (Layer 1) has accumulated significant complexity and is not 
well-suited to
+serve as a general-purpose or auth-dedicated caching layer.**
+
+##### Mixed responsibilities make it hard to maintain
+
+`CaffeineEntityCache` uses a single `Cache<EntityCacheRelationKey, 
List<Entity>>` to store
+three semantically different kinds of data:
+
+| Stored data             | Key form                                         | 
Example relation types                                    |
+|-------------------------|--------------------------------------------------|-----------------------------------------------------------|
+| Direct entity           | `(nameIdentifier, entityType, null)`             | 
any entity: catalog, schema, table, user, role, ...       |
+| Relation result set     | `(nameIdentifier, entityType, relType)`          | 
`ROLE_USER_REL`, `TAG_METADATA_OBJECT_REL`, ...           |
+| Reverse index entries   | `ReverseIndexCache` (separate radix tree)        | 
entity → list of cache keys that reference it             |
+
+On top of this, a `cacheIndex` (radix tree) keeps a prefix-indexed view of all 
keys to
+support cascading invalidation. The resulting invalidation logic 
(`invalidateEntities`) is a
+BFS traversal that walks both the forward index and the reverse index, making 
it difficult to
+reason about correctness and hard to extend safely.
+
+The five relation types currently tracked (`METADATA_OBJECT_ROLE_REL`, 
`ROLE_USER_REL`,
+`ROLE_GROUP_REL`, `POLICY_METADATA_OBJECT_REL`, `TAG_METADATA_OBJECT_REL`) are 
all
+auth-related, which reflects the original design intent: **the entity cache 
was built
+primarily to serve the auth path.** Over time it accumulated relation types 
and reverse-index
+logic without a clear ownership model, making it harder to maintain and evolve.
+
+##### Limited benefit for non-auth interfaces
+
+For general metadata API calls (list catalogs, list schemas, list tables), the 
entity cache
+provides minimal benefit:
+
+| Operation                          | Goes through cache? | Notes             
                                |
+|------------------------------------|---------------------|---------------------------------------------------|
+| `list(namespace, type)`            | **No**              | Bypasses cache 
entirely; always hits DB           |
+| `get(ident, type)` (single entity) | Yes                 | Cache helps on 
repeated reads of the same entity  |
+| `update(ident, type)`              | Invalidate only     | Invalidates 
entry, write always goes to DB        |
+| `listEntitiesByRelation(...)`      | Yes                 | Only for the five 
auth-centric relation types     |
+
+In practice, the most common metadata browsing operations (`LIST` endpoints) 
are not cached
+at the entity store level. The cache's real workload is the auth path, where 
the same user
+entity, role assignments, and resource IDs are resolved on every single 
authorization check.
+
+**Conclusion:** The entity cache is a de-facto auth cache dressed up as a 
general-purpose
+cache. Its complexity is unjustified for the non-auth use case, and its 
TTL-based consistency
+model is insufficient for the auth use case (see §1.8). A purpose-built auth 
cache layer —
+separate from the entity store — is the cleaner path forward.
+
+---
+
+### 1.3 JCasbin Authorization — Deep Dive
+
+#### 1.3.1 Call Graph for a Single `authorize()` Check
+
+```
+JcasbinAuthorizer.authorize(principal, metalake, metadataObject, privilege)
+│
+├─ [A] getUserEntity(username, metalake)
+│       entityStore.get(USER by NameIdentifier)
+│       → Needed to obtain integer userId for JCasbin enforce()
+│
+├─ [B] MetadataIdConverter.getID(metadataObject, metalake)        ← TARGET 
RESOURCE
+│       entityStore.get(entity by NameIdentifier)
+│       → Needed to get integer metadataId for JCasbin enforce()
+│       → Called on every auth request
+│
+├─ [C] loadRolePrivilege(metalake, username, userId, requestContext)
+│   │   (guarded by requestContext.hasLoadRole — runs once per HTTP request)
+│   │
+│   ├─ [C1] entityStore.listEntitiesByRelation(ROLE_USER_REL, userIdentifier)
+│   │         → Get all roles assigned to this user
+│   │
+│   └─ For each role NOT already in loadedRoles cache:
+│       ├─ [C2] entityStore.get(RoleEntity by name)   ← async, thread pool
+│       └─ loadPolicyByRoleEntity(roleEntity)
+│           └─ For each securableObject in role.securableObjects():
+│               ├─ [C3] MetadataIdConverter.getID(securableObject, metalake)
+│               └─ enforcer.addPolicy(roleId, objType, metadataId, privilege, 
effect)
+│
+│   loadedRoles.put(roleId, true)   ← mark role as loaded
+│
+├─ [D] isOwner() / loadOwnerPolicy(...)   ← called on EVERY auth request (not 
only OWNER
+│   │   privilege checks). Nearly all auth expressions contain ANY(OWNER, 
METALAKE, CATALOG),
+│   │   which expands to METALAKE::OWNER || CATALOG::OWNER || … and calls 
isOwner() directly
+│   │   via OGNL, independently of the authorize() path. Typical call count: 
2–4 per request.
+│   ├─ Check ownerRel cache → if HIT, return (most non-owner users get 
Optional.empty())
+│   └─ [D1] entityStore.listEntitiesByRelation(OWNER_REL, ...)
+│             ownerRel.put(metadataId, Optional.of(ownerId))
+│
+└─ [E] enforcer.enforce(userId, objectType, metadataId, privilege)   ← 
in-memory, O(1)
+```
+
+#### 1.3.2 What Each Cache Protects
+
+`loadedRoles: Cache<Long, Boolean>` — answers "is this role's policy already 
in JCasbin?"
+Without it, every request re-executes [C2]+[C3] for all roles the user has 
(N+1 queries).
+With it, [C2]+[C3] only run on first load per role. **This is the most 
critical cache.**
+
+`ownerRel: Cache<Long, Optional<Long>>` — caches ownership lookups for 
OWNER-privilege
+checks. **Contrary to initial analysis, `ownerRel` is consulted on virtually 
every auth
+request**, not only when `privilege == OWNER`. The reason is that nearly every 
authorization
+expression in `AuthorizationExpressionConstants` includes `ANY(OWNER, 
METALAKE, CATALOG)`
+or similar clauses (e.g. `LOAD_TABLE_AUTHORIZATION_EXPRESSION`,
+`FILTER_TABLE_AUTHORIZATION_EXPRESSION`, 
`LOAD_CATALOG_AUTHORIZATION_EXPRESSION`). The
+`ANY(OWNER, …)` macro expands to `METALAKE::OWNER || CATALOG::OWNER || …`, and 
each
+`X::OWNER` term calls `isOwner()` directly — a code path that is **independent 
of
+`authorize()`**. As a result, every auth request triggers 2–4 `isOwner()` 
calls (one per
+ancestor level), each consulting `ownerRel`. For most non-owner users, 
`ownerRel` caches
+`Optional.empty()`, which lets the ownership sub-check fail quickly without a 
DB query.
+Without `ownerRel`, every auth request would add 2–4 extra DB queries against 
`owner_meta`.
+
+**What these caches do NOT protect** (hit DB on every auth request without 
entity cache):
+
+| Call                                         | Description                   
            | Protected by      |
+|----------------------------------------------|-------------------------------------------|-------------------|
+| [A] `getUserEntity()`                        | Fetch User entity → get 
integer userId    | Entity cache only |
+| [B] `MetadataIdConverter.getID()` target     | Resolve target resource name 
→ integer ID | Entity cache only |
+| [C1] `listEntitiesByRelation(ROLE_USER_REL)` | Get user's role list          
            | Entity cache only |
+
+---
+
+### 1.4 Impact of Disabling Entity Cache
+
+Layer 2 sits **on top of** Layer 1. When Layer 1 is disabled (NoOpsCache), 
calls [A], [B],
+[C1] hit DB on every auth request.
+
+| Call                                             | With entity cache         
    | Without entity cache            |
+|--------------------------------------------------|-------------------------------|---------------------------------|
+| [A] `getUserEntity()`                            | Cache hit after first 
request | **DB query every auth request** |
+| [B] `MetadataIdConverter.getID()` target         | Cache hit after first 
request | **DB query every auth request** |
+| [C1] `listEntitiesByRelation(ROLE_USER_REL)`     | Cache hit after first 
request | **DB query every auth request** |
+| [C2] `entityStore.get(RoleEntity)`               | Protected by 
`loadedRoles`    | DB only on cold role load       |
+| [C3] `MetadataIdConverter.getID()` per privilege | Protected by 
`loadedRoles`    | DB only on cold role load       |
+| [D1] `listEntitiesByRelation(OWNER_REL)`         | Protected by `ownerRel`   
    | **DB query 2–4x per request**   |
+
+---
+
+
+## 2. Goals
+
+### 2.1 The Two Problems to Solve
+
+**Problem 1 — Performance:** With entity cache disabled, [A] and [C1] hit DB 
on every auth
+request. The new auth cache layer must protect these without relying on entity 
store cache.
+([B] also hits DB, but this is correct and acceptable — see §1.5.)
+
+**Problem 2 — Consistency:** `loadedRoles` is TTL-bounded (1 hour staleness) 
and updated by hook with in a instance. Permission
+changes must take effect at the next auth request, not after TTL expiry.
+
+Both problems are solved by the same mechanism: a version-validated cache for 
the user's role
+list (userId comes for free from the same query).
+
+### 2.2 Requirements
+
+| Goal                            | Requirement                                
                                                                   |
+|---------------------------------|---------------------------------------------------------------------------------------------------------------|
+| HA auth consistency             | Privilege revocations visible on all nodes 
at the next auth request                                           |
+| Auth self-sufficiency           | [A] and [C1] protected without relying on 
entity store cache                                                  |
+| Auth performance                | Hot path: ≤ 3 lightweight DB queries       
                                                                   |
+| No new mandatory infrastructure | Solution requires only the existing DB     
                                                                   |
+| Incremental delivery            | Phase 1 independently shippable            
                                                                   |
+
+---
+
+## 3. Industry Reference
+
+### 3.1 Apache Polaris — Per-Entity Version Tracking
+
+#### Schema
+
+All entity types (catalogs, namespaces, tables, roles, principals) share a 
single `ENTITIES`
+table (single-table inheritance). The two version columns are the key fields 
for caching:
+
+```sql
+ENTITIES (
+  id                     BIGINT,   -- Unique entity ID
+  catalog_id             BIGINT,   -- Owning catalog (0 for top-level entities)
+  parent_id              BIGINT,   -- Parent entity ID, forms the hierarchy 
tree
+  type_code              INT,      -- Entity type enum (see hierarchy below)
+  name                   VARCHAR,
+  entity_version         INT,      -- Bumped on rename / property update / 
drop  ← key
+  sub_type_code          INT,      -- Subtype (ICEBERG_TABLE, ICEBERG_VIEW, 
etc.)
+  properties             JSON,     -- User-visible properties (location, 
format, etc.)
+  internal_properties    JSON,     -- Internal properties (credentials, 
storage config, etc.)
+  grant_records_version  INT,      -- Bumped on every GRANT or REVOKE          
     ← key
+)
+
+GRANT_RECORDS (
+  securable_catalog_id  BIGINT,
+  securable_id          BIGINT,   -- The resource being secured 
(table/namespace/catalog)
+  grantee_catalog_id    BIGINT,
+  grantee_id            BIGINT,   -- The principal or role receiving the grant
+  privilege_code        INT       -- One of 102 defined privileges
+)
+```
+
+`GRANT_RECORDS` has no version column of its own. The version fingerprint is 
stored in
+`ENTITIES.grant_records_version` — detecting staleness requires no scan of 
`GRANT_RECORDS`.
+
+#### Entity Type Hierarchy
+
+```
+ROOT
+  ├── PRINCIPAL          (user account,      isGrantee)
+  ├── PRINCIPAL_ROLE     (user-level role,   isGrantee)
+  └── CATALOG
+        ├── CATALOG_ROLE (catalog-level role, isGrantee)
+        ├── NAMESPACE
+        │     └── TABLE_LIKE / POLICY / FILE
+        └── TASK
+```
+
+Only `PRINCIPAL`, `PRINCIPAL_ROLE`, and `CATALOG_ROLE` are **grantees** (can 
receive grants).
+All others are **securables** (privileges are set on them).
+
+#### How `grantRecordsVersion` Is Maintained
+
+Every `grantPrivilege` / `revokePrivilege` call performs three writes in **one 
DB transaction**:
+
+1. Insert or delete the `GRANT_RECORDS` row.
+2. Increment `grant_records_version` on the **grantee** entity row.
+3. Increment `grant_records_version` on the **securable** entity row.
+
+Both sides are bumped atomically — no separate changelog table is needed.
+
+#### Version-Validated Cache
+
+The cache unit is `ResolvedPolarisEntity` = entity metadata + grant records in 
both directions.
+On every request, `bulkValidate()` issues one batch query for all path 
entities:
+
+```sql
+SELECT * FROM ENTITIES WHERE (catalog_id, id) IN ((?, ?), ...)
+```
+
+| Path                    | Condition              | Action                    
             |
+|-------------------------|------------------------|----------------------------------------|
+| Cache hit               | Both versions current  | Serve from cache — **0 
extra queries** |
+| Stale, targeted refresh | Either version behind  | Reload only the changed 
dimension      |
+| Cache miss              | Not in cache           | Full load                 
             |
+
+The DB is the single source of truth; no broadcast is needed for correctness.
+
+**Key difference from Gravitino:** Polaris bundles entity + grants in one 
cached object, so one
+batch query covers both dimensions. Gravitino separates user→role from 
role→privilege, requiring
+2 version-check queries on a warm hit (see §4.7 Step 1 and Step 3). Both 
achieve strong
+consistency.
+
+### 3.2 Other References
+
+**Nessie** — HTTP fan-out invalidation: async POST to peer nodes on write, 
convergence < 200 ms.
+
+**Keycloak** — JGroups embedded cluster messaging: in-JVM broadcast, no 
separate service.
+Recommended future direction if Gravitino needs stronger delivery guarantees.
+
+**DB version polling** — monotonic counters incremented in write transaction; 
a background
+thread polls for version changes and proactively invalidates caches. 
Considered but not
+adopted; per-request validation (§4.7) achieves strong consistency without 
background threads.
+
+---
+
+## 4. Design
+
+### 4.1 Design Overview
+
+Three caches drive auth performance: the user/group → role mapping, entity 
name → integer ID,
+and ownership lookups. Each has different access frequency, mutation rate, and 
security impact
+— and consequently a different consistency model.
+
+**Consistency tier 1 — strong (version-validated):** User-role assignments and 
role-privilege
+definitions are security-critical. A revoked permission must not be served 
from cache even one
+second after revocation. Each auth request issues two lightweight 
version-check queries against
+`user_meta`, `group_meta`, and `role_meta`. If any `updated_at` timestamp has 
advanced since
+the cached value, only the stale portion is reloaded. Staleness window: 
**zero**.
+
+**Consistency tier 2 — eventual (write-path hook + change poller):** Entity 
name→ID mappings
+and ownership records change far less frequently (DDL, ownership transfers) 
and a brief window
+of inconsistency has lower security impact. The local node sees changes 
immediately via hooks
+that fire after transaction commit. HA peer nodes converge within the change 
poll interval
+(default 1 s) via two lightweight poll queries. No external infrastructure 
(Kafka, Redis) is
+required — the existing DB is the single source of truth for both tiers.
+
+---
+
+### 4.1.1 Current-vs-Target Gap (Code-Aligned)
+
+The design below intentionally closes concrete gaps in the current 
implementation:
+
+| Area                                 | Current behavior (main branch)        
                                    | Target behavior (this design)             
                                                  |
+|--------------------------------------|---------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|
+| Role loading                         | 
`JcasbinAuthorizer.loadRolePrivilege()` loads only `ROLE_USER_REL`        | 
Load both user direct roles and group-derived roles (`group_role_rel`)          
            |
+| Owner check path                     | `isOwner()` calls 
`MetadataIdConverter.getID()` twice for the same object | Resolve metadata ID 
once per call path and reuse                                            |
+| Role cache coherence                 | `loadedRoles: Cache<Long, Boolean>` 
is TTL-driven                         | `loadedRoles: roleId -> updated_at` 
with per-request version validation                     |
+| Cross-node entity/owner invalidation | In-process hook/TTL only, no durable 
HA invalidation stream               | DB-backed pollers 
(`owner_meta.updated_at`, `entity_change_log`) with targeted invalidation |
+| Request-scope dedup                  | `AuthorizationRequestContext` has 
allow/deny result cache + `hasLoadRole` | Add request-scope owner/id dedup maps 
with strict request-thread scope                      |
+
+This section is used as implementation acceptance criteria and should stay 
synchronized with code
+changes in `server-common/.../JcasbinAuthorizer.java`,
+`server-common/.../MetadataIdConverter.java`, and 
`core/.../AuthorizationRequestContext.java`.
+
+---
+
+### 4.2 Strong Consistency: User, Group, and Role Caches
+
+#### Why Strong Consistency Is Required
+
+Privilege revocations are the primary security enforcement operation. If a 
user's role is
+revoked or a role's privilege is removed, the change must take effect on the 
**next** auth
+request on any node, not after TTL expiry. TTL-only caching is fundamentally 
unable to
+provide this guarantee.
+
+The chosen approach is Polaris-style per-request version validation: each row 
in `user_meta`,
+`group_meta`, and `role_meta` carries an `updated_at` timestamp set in the 
same DB transaction
+as the security write. On every auth request, the authorizer fetches these 
timestamps and
+compares them against cached values. A mismatch triggers a targeted reload of 
only the changed
+entry — not a full policy flush.
+
+Groups are **not optional**: a user can belong to a group that itself holds 
role assignments.
+`group_meta.updated_at` receives the same treatment as `user_meta.updated_at`, 
so group-role
+changes are immediately reflected everywhere.
+
+Using a timestamp instead of a monotonic counter has a theoretical 
same-millisecond collision
+risk (two writes within 1 ms yield the same value → cache misses the second 
change), but this
+is negligible for administrative operations (GRANT/REVOKE) in practice.
+
+#### Schema Changes
+
+```sql
+-- Role privilege tracking (strong consistency — Step 3 version check)
+ALTER TABLE `role_meta`
+    ADD COLUMN `updated_at` BIGINT NOT NULL DEFAULT 0
+    COMMENT 'Set to currentTimeMillis() on any privilege grant/revoke for this 
role.
+             JcasbinAuthorizer compares db.updated_at vs cached updated_at per 
request
+             to decide whether to reload JCasbin policies for this role.';
+
+-- User role assignment tracking (strong consistency — Step 1a version check)
+ALTER TABLE `user_meta`
+    ADD COLUMN `updated_at` BIGINT NOT NULL DEFAULT 0
+    COMMENT 'Set to currentTimeMillis() on any role assign/revoke for this 
user.
+             JcasbinAuthorizer compares db.updated_at vs cached updated_at per 
request
+             to decide whether to reload the user-role mapping.';
+
+-- Group role assignment tracking (strong consistency — Step 1b version check)
+ALTER TABLE `group_meta`
+    ADD COLUMN `updated_at` BIGINT NOT NULL DEFAULT 0
+    COMMENT 'Set to currentTimeMillis() on any role assign/revoke for this 
group.
+             JcasbinAuthorizer compares db.updated_at vs cached updated_at per 
request
+             to decide whether to reload the group-role mapping.';
+```
+
+#### Index and Backfill Notes
+
+To keep Step 1 and Step 3 checks predictable under load, add/verify covering 
indexes for
+high-frequency predicates:
+
+```sql
+-- Suggested read-path indexes for version checks
+CREATE INDEX idx_user_meta_name_del_upd
+    ON user_meta (metalake_id, user_name, deleted_at, updated_at);
+CREATE INDEX idx_group_meta_del_upd
+    ON group_meta (group_id, deleted_at, updated_at);
+CREATE INDEX idx_role_meta_del_upd
+    ON role_meta (role_id, deleted_at, updated_at);
+CREATE INDEX idx_owner_meta_obj_del_upd
+    ON owner_meta (metadata_object_id, deleted_at, updated_at);
+```
+
+Backfill strategy for the newly added `updated_at` columns:
+
+1. DDL adds columns with default `0`.
+2. One-time backfill sets `updated_at = create_time` (or `last_modified_time` 
if available)
+   for existing active rows.
+3. New write-path hooks become the long-term source of truth.
+
+Using explicit backfill avoids a long-lived "all zero" window that would force 
unnecessary cold
+reloads at rollout time.
+
+---
+
+### 4.3 Eventual Consistency: Ownership Cache (`ownerRelCache`)
+
+#### Why `ownerRelCache` Is Critical for Performance
+
+Nearly all authorization expressions include `ANY(OWNER, METALAKE, CATALOG)` or
+`ANY(OWNER, METALAKE, CATALOG, SCHEMA, ...)`. These expand via OGNL to a chain 
of
+`METALAKE::OWNER || CATALOG::OWNER || ...` calls. Each term calls `isOwner()` 
**directly**,
+independent of the `authorize()` path. Every auth request triggers **2–4 
`isOwner()` calls**
+(one per ancestor level). Without a cache, this adds 2–4 extra `owner_meta` DB 
queries per
+request. For most non-owner users, the result is `Optional.empty()`, so the 
cache primarily
+stores empty-ownership negatives that let the check fail quickly.
+
+#### Why Version-Validated Caching Is Unnecessary for Ownership
+
+| Cache                                            | What a version check 
returns     | What it saves                                                     
                |
+|--------------------------------------------------|----------------------------------|-----------------------------------------------------------------------------------|
+| `loadedRoles`                                    | `(role_id, updated_at)`   
       | Skips reloading all securable objects + JCasbin `addPolicy` calls — 
**expensive** |
+| `ownerRelCache` (hypothetical version-validated) | `(metadata_object_id, 
owner_id)` | Nothing — the version check query **already returns `owner_id`**   
               |
+
+A version-validated `ownerRelCache` would add schema columns, write-path 
version bumps, and
+per-request version queries — while saving exactly zero DB queries beyond what 
the version
+check itself costs. Complexity without benefit.
+
+#### Invalidation Strategy: TTL Safety-Net + Write-Path Hook + Owner Change 
Poller
+
+`ownerRelCache` uses a three-layer strategy:
+
+1. **Local node — immediate**: `handleMetadataOwnerChange()` hook fires after 
the ownership
+   transfer transaction commits and calls 
`ownerRelCache.invalidate(metadataId)`.
+2. **HA peer nodes — targeted, near real-time (≤ 1 s)**: the owner change 
poller queries
+   `owner_meta WHERE updated_at > maxOwnerUpdatedAt`. For each returned row it 
calls
+   `ownerRelCache.invalidate(metadataObjectId)` — only the changed entries are 
evicted;
+   unrelated cached ownerships remain hot.
+3. **TTL — safety net only**: a long TTL (e.g. 1 hour) catches any missed 
invalidation
+   (e.g. poller downtime). Correctness relies on hook + poller, not TTL.
+
+`owner_meta` is a 1:1 table (one row per entity with an owner). The poller can 
read
+`updated_at` directly from the source table and immediately get the 
`metadata_object_id` to
+invalidate — no intermediate log table is needed. This avoids write 
amplification and keeps
+the design simple.
+
+#### Why Eventual Consistency Is Safe for Ownership
+
+Privilege revocation (GRANT/REVOKE) is handled by the **strong-consistency** 
Steps 1 + 3.
+Ownership transfer is an administrative reorganisation, not an emergency 
access revocation —
+a ≤ 1 s grace period on HA peer nodes is operationally acceptable and 
consistent with how
+similar systems (AWS IAM, Apache Polaris) treat structural metadata changes.
+
+#### Schema Change
+
+```sql
+-- Ownership mutation tracking (eventual consistency — owner change poller)
+ALTER TABLE `owner_meta`
+    ADD COLUMN `updated_at` BIGINT NOT NULL DEFAULT 0
+    COMMENT 'Set to currentTimeMillis() on any ownership transfer.
+             The owner change poller reads updated_at > maxSeen to find 
changed rows
+             and invalidates only the specific metadataObjectIds in 
ownerRelCache.';
+```
+
+---
+
+### 4.4 Eventual Consistency: Name→ID Cache (`metadataIdCache`)
+
+#### The Problem: Repeated `getID()` Calls in OGNL Expression Evaluation
+
+`MetadataIdConverter.getID()` calls `entityStore.get()` for every unique 
`(MetadataObject,
+privilege)` combination in the OGNL expression. The `allowAuthorizerCache` 
deduplicates
+complete `(principal, metalake, obj, privilege)` results, but different 
privileges on the same
+object (e.g. `METALAKE::USE_CATALOG`, `METALAKE::USE_SCHEMA`, 
`METALAKE::DENY_USE_CATALOG`)
+each trigger a separate `getID(METALAKE)` call. A full 
`LOAD_TABLE_AUTHORIZATION_EXPRESSION`
+evaluation can trigger **8–12 `getID()` calls**, of which most are for the 
same 3–4 objects.
+
+#### Hierarchical Cache Key with Prefix-Based Cascade Invalidation
+
+The cache key uses a hierarchical `::` separator that enables prefix-based 
cascade eviction:
+
+| Entity type | Key example                     | Is non-leaf?      |
+|-------------|---------------------------------|-------------------|
+| METALAKE    | `lake1::`                       | ✓ (trailing `::`) |
+| CATALOG     | `lake1::cat1::`                 | ✓                 |
+| SCHEMA      | `lake1::cat1::s1::`             | ✓                 |
+| TABLE       | `lake1::cat1::s1::t1::TABLE`    | leaf              |
+| FILESET     | `lake1::cat1::s1::fs1::FILESET` | leaf              |
+| TOPIC       | `lake1::cat1::s1::tp1::TOPIC`   | leaf              |
+| MODEL       | `lake1::cat1::s1::m1::MODEL`    | leaf              |
+| VIEW        | `lake1::cat1::s1::v1::VIEW`     | leaf              |
+
+`invalidateByPrefix("lake1::cat1::")` evicts the catalog entry AND all 
schemas, tables,
+filesets, and other entities beneath it in a single O(n) pass over the cache 
(bounded, DDL is
+rare).
+
+#### Why `entity_change_log` Instead of Adding `updated_at` to Entity Tables
+
+The natural alternative is adding `updated_at` to each entity table 
(metalake_meta,
+catalog_meta, schema_meta, table_meta, …) and polling them directly. This has 
three
+fundamental problems:
+
+| Problem                                        | Explanation                 
                                                                                
                                                                                
                  |
+|------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| **Old name unavailable after rename**          | Entity tables store the 
**current** name only. After `table1 → table2`, the row holds `table2`. The 
poller can detect *a* change happened but cannot reconstruct the old cache key 
to invalidate it.          |
+| **JOIN cost to reconstruct full path**         | Nested tables (schema, 
table, fileset, …) store only their simple name. Rebuilding the full 
`catalog.schema.table` path requires multi-level JOINs per entity type — eight 
separate queries or a complex UNION every poll cycle. |
+| **Cascade requires scanning all child tables** | Dropping `cat1` means also 
polling schema_meta, table_meta, … for all rows under `cat1`. With 
`entity_change_log`, **one row** for the catalog + 
`invalidateByPrefix("lake::cat1::")` evicts the entire subtree. |
+
+`entity_change_log` solves all three: `(metalake_name, entity_type, 
full_name)` gives the
+poller exactly what it needs to call `buildCacheKey + invalidateByPrefix`, and 
`operate_type`
+documents the nature of the change for observability.
+
+**Rename correctness:** Rename does not change `metadataId`. The write path 
logs
+`operate_type=ALTER, full_name=oldName`. The poller calls 
`invalidateByPrefix(buildCacheKey
+(lake, CATALOG, "cat1"))` → evicts `lake::cat1::` and all children. New name 
keys are cold
+misses → DB → same numeric ids. Policy in JCasbin is unaffected (keyed on 
numeric ids, not
+names).
+
+**Per-request dedup:** A `Map<String, Long>` in `AuthorizationRequestContext` 
provides an
+additional within-request dedup layer. When the Caffeine cache is cold, the 
first `getID()`
+call for a given object populates the request-level map; subsequent calls 
within the same
+request avoid repeated Caffeine lookups. On the warm path (Caffeine hits), the 
request-level
+map is a minor CPU optimisation.
+
+**Write amplification:** one row per affected entity per operation — never per 
child. Entity
+DDL is rare in production. Rows are pruned after a configurable retention 
window (default 1
+hour).
+
+#### Invalidation Strategy: Persistent Caffeine Cache + Write-Path Hook + 
Entity Change Log Poller
+
+`metadataIdCache` uses a three-layer strategy:
+
+1. **Local node — immediate**: `handleEntityStructuralChange()` hook fires 
after transaction
+   commit and calls `metadataIdCache.invalidateByPrefix(buildCacheKey(...))`. 
Non-leaf entities
+   cascade to all children; leaf entities match exactly one entry.
+2. **HA peer nodes — cascade, near real-time (≤ 1 s)**: the entity change 
poller reads
+   `entity_change_log WHERE created_at > maxEntityCreatedAt`, rebuilds the 
cache key from
+   `(metalake_name, entity_type, full_name)`, and calls `invalidateByPrefix` — 
**one log row
+   per DROP/ALTER operation regardless of how many children exist**.
+3. **TTL — safety net only**: a long TTL (e.g. 1 hour) as a last resort.
+
+#### Schema Change
+
+```sql
+-- Entity name→id mutation tracking (eventual consistency — entity change 
poller)
+CREATE TABLE `entity_change_log` (
+  `id`            BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
+  `metalake_name` VARCHAR(128)    NOT NULL,
+  `entity_type`   VARCHAR(32)     NOT NULL
+      COMMENT 'METALAKE | CATALOG | SCHEMA | TABLE | FILESET | TOPIC | MODEL | 
VIEW',
+  `full_name`     VARCHAR(512)    NOT NULL
+      COMMENT 'Dot-separated full name of the affected entity. For RENAME, 
stores the
+               OLD name (the stale key to invalidate). For DROP/ALTER, the 
entity name.',
+  `operate_type`  VARCHAR(16)     NOT NULL
+      COMMENT 'DROP | CREATE | ALTER (ALTER covers rename and other structural 
changes)',
+  `created_at`    BIGINT          NOT NULL,
+  PRIMARY KEY (`id`),
+  INDEX `idx_created_at` (`created_at`)
+) COMMENT 'Append-only log of entity structural changes.
+           One row per affected entity per operation. The entity change poller 
reads
+           this table to drive targeted invalidation of metadataIdCache on HA 
peer nodes.
+           Rows older than the retention window (default 1 h) are pruned 
periodically.';
+```
+
+---
+
+### 4.5 Write Path Invariants
+
+All schema tracking writes must execute **in the same DB transaction** as the 
data change.
+If the transaction rolls back, none of the tracking writes are committed — no 
spurious cache
+invalidations on HA peers.
+
+| Operation                              | Schema write                        
                                                                                
                                           | Location                           
    |
+|----------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------|
+| Grant / revoke privilege on role R     | `role_meta.updated_at = now() WHERE 
role_id = R`                                                                    
                                           | `RoleMetaService`                  
    |
+| Assign / revoke role for user U        | `user_meta.updated_at = now() WHERE 
user_id = U`                                                                    
                                           | `UserMetaService`                  
    |
+| Assign / revoke role for group G       | `group_meta.updated_at = now() 
WHERE group_id = G`                                                             
                                                | `GroupMetaService`            
         |
+| Ownership transfer for entity E        | `owner_meta.updated_at = now() 
WHERE metadata_object_id = E`                                                   
                                                | `OwnerMetaService`            
         |
+| Rename entity (old name → new name)    | INSERT into `entity_change_log`: 
`operate_type=ALTER`, **old** `full_name`, `created_at=now()`                   
                                              | All entity MetaService classes  
       |
+| Drop entity with name N                | INSERT into `entity_change_log`: 
`operate_type=DROP`, `full_name=N`, `created_at=now()`                          
                                              | All entity MetaService classes  
       |
+| Create entity with name N              | INSERT into `entity_change_log`: 
`operate_type=CREATE`, `full_name=N`, `created_at=now()` (optional — new names 
are cache misses; insert is a safety net only) | All entity MetaService classes 
        |
+
+---
+
+### 4.6 Cache Data Structures (Changes in JcasbinAuthorizer)
+
+```java
+// ─── BEFORE ──────────────────────────────────────────────────────────
+private Cache<Long, Boolean>           loadedRoles;  // roleId → loaded?
+private Cache<Long, Optional<Long>>    ownerRel;
+
+// ─── AFTER ───────────────────────────────────────────────────────────
+
+// NEW: replaces entity cache dependency for [A] (userId) and [C1] (role list).
+// Step 1 query returns both user_id and updated_at in one shot.
+// metalakeName→metalakeId resolved inline via JOIN — no dedicated cache 
needed.
+private GravitinoCache<String, CachedUserRoles> userRoleCache;
+// key = metalakeName + ":" + userName
+
+record CachedUserRoles(
+    long       userId,     // integer userId for JCasbin enforce()
+    long       updatedAt,  // user_meta.updated_at at load time — staleness 
sentinel
+    List<Long> roleIds     // role ID list at load time
+) {}
+
+// NEW: mirrors userRoleCache for groups (group can also hold role 
assignments).
+private GravitinoCache<String, CachedGroupRoles> groupRoleCache;
+// key = metalakeName + ":" + groupName
+
+record CachedGroupRoles(
+    long       groupId,
+    long       updatedAt,  // group_meta.updated_at at load time — staleness 
sentinel
+    List<Long> roleIds
+) {}
+
+// TYPE CHANGE: was Cache<Long, Boolean>, now stores role_meta.updated_at.
+// Enables staleness detection rather than TTL expiry.
+private GravitinoCache<Long, Long> loadedRoles;
+// roleId → role_meta.updated_at at the time JCasbin policies were loaded
+
+// NEW: caches name → integer id for every MetadataObject referenced in OGNL 
expressions.
+// Without this, every authorize()/isOwner() call triggers entityStore.get() 
for each unique
+// object in the expression chain (METALAKE, CATALOG, SCHEMA, TABLE etc.), 
multiplied by the
+// number of distinct privilege checks on that object.
+// Consistency: immediate on local node via handleEntityStructuralChange() 
hook on drop/rename;
+//              HA peer nodes: entity change poller reads entity_change_log 
WHERE created_at > maxSeen
+//              and calls invalidateByPrefix(cacheKey) for each row within the 
poll interval (default 1 s).
+//              Non-leaf entities (CATALOG, SCHEMA) use prefix invalidation to 
cascade-evict all
+//              children in one call. See §4.4.
+// TTL: long safety-net only (e.g. 1 hour). Correctness comes from hook + 
poller, not TTL.
+private GravitinoCache<String, Long> metadataIdCache;
+// Key format — hierarchical path, type suffix on leaf entities only:
+//   metalakeName::cat::schema::        ← SCHEMA (non-leaf, trailing :: for 
prefix cascade)
+//   metalakeName::cat::schema::t::TABLE ← TABLE (leaf, ::TYPE suffix for 
disambiguation)
+// On DROP/RENAME of a non-leaf: invalidateByPrefix("lake::cat::") evicts the 
entity + all children.
+// On DROP/RENAME of a leaf:     
invalidateByPrefix("lake::cat::schema::t::TABLE") evicts exactly one entry.
+
+// RESTORED with mutation-poller-driven invalidation (see §4.3 for full 
rationale).
+// Consistency: immediate on local node via handleMetadataOwnerChange() hook;
+//              HA peer nodes: invalidated within mutation poll interval 
(default 1 s).
+// TTL serves only as a safety-net last resort (default: long, e.g. 1 hour).
+private GravitinoCache<Long, Optional<Long>> ownerRelCache;
+// key = metadataId (Long); value = Optional<Long> ownerId (empty = no owner 
set)
+
+// ── Targeted pollers — drive HA cross-node invalidation of ownerRelCache / 
metadataIdCache ──
+
+// Max updated_at seen across all owner_meta rows so far.
+// Poller finds rows WHERE updated_at > maxOwnerUpdatedAt and invalidates 
specific entries.
+private final AtomicLong maxOwnerUpdatedAt = new AtomicLong(0L);
+
+// Max created_at seen across all entity_change_log rows so far.
+// Poller finds rows WHERE created_at > maxEntityCreatedAt and invalidates 
specific entries.
+private final AtomicLong maxEntityCreatedAt = new AtomicLong(0L);
+
+// Single-thread scheduled executor shared by both poll tasks.
+// One tiny owner_meta query + one tiny entity_change_log query per interval.
+// Distinct from the executor thread pool (removed in Phase 2); never touches 
auth logic.
+private ScheduledExecutorService changePoller;
+// Poll interval: configurable, default 1 s 
(Configs.GRAVITINO_CHANGE_POLL_INTERVAL_MS).
+```
+
+---
+
+### 4.7 Auth Check Flow
+
+```
+authorize(metalakeName, username, resource, operation)
+│
+├─ STEP 1 — User + Group version check (2 queries, metalake resolved via JOIN):
+│
+│   [1a] User query:
+│   SELECT um.user_id, um.updated_at
+│   FROM user_meta um
+│   JOIN metalake_meta mm ON um.metalake_id = mm.metalake_id AND mm.deleted_at 
= 0
+│   WHERE mm.metalake_name = ? AND um.user_name = ? AND um.deleted_at = 0
+│
+│   userRoleCache HIT and updated_at matches:
+│     → use cached userId and roleIds               [A] and [C1] avoided
+│   MISS or updated_at mismatch:
+│     → SELECT role_id FROM user_role_rel WHERE user_id = ? AND deleted_at = 0
+│     → re-associate userId ↔ roleIds in JCasbin allow/deny enforcers
+│     → userRoleCache.put(key, CachedUserRoles(userId, updatedAt, roleIds))
+│
+│   [1b] Group query (user may belong to groups that also hold roles):
+│   SELECT gm.group_id, gm.updated_at
+│   FROM group_meta gm
+│   JOIN group_user_rel gu ON gm.group_id = gu.group_id AND gu.deleted_at = 0
+│   WHERE gu.user_id = ? AND gm.deleted_at = 0
+│
+│   For each group:
+│     groupRoleCache HIT and updated_at matches:
+│       → use cached groupId and roleIds            [group C1] avoided
+│     MISS or updated_at mismatch:
+│       → SELECT role_id FROM group_role_rel WHERE group_id = ? AND deleted_at 
= 0
+│       → addRoleForUser(userId, roleId) in JCasbin enforcers
+│       → groupRoleCache.put(groupKey, CachedGroupRoles(groupId, updatedAt, 
roleIds))
+│
+│   Note: current code only loads user-direct roles (ROLE_USER_REL). Loading 
group roles
+│   via [1b] is a NEW capability introduced in Phase 2 alongside 
groupRoleCache.
+│
+├─ STEP 2 — Resolve name → integer id (via metadataIdCache, eventual 
consistency):
+│
+│   key = buildCacheKey(metalakeName, entityType, fullName)
+│         → non-leaf (CATALOG, SCHEMA): "metalake::cat::schema::"  (trailing 
:: for prefix cascade)
+│         → leaf    (TABLE, FILESET…):  "metalake::cat::schema::t::TABLE"
+│   metadataIdCache.getIfPresent(key)
+│     HIT  → use cached id, 0 DB                        [B] avoided on warm 
path
+│     MISS → entityStore.get() → 1 indexed DB query
+│            → metadataIdCache.put(key, id)
+│
+│   Called for every MetadataObject referenced in authorize()/isOwner() within 
the OGNL
+│   expression (METALAKE, CATALOG, SCHEMA, TABLE, …). Without this cache, each 
unique
+│   (obj, privilege) combination in the expression chain triggers a separate 
DB lookup
+│   for the same entity id. See §4.4 for key structure and invalidation 
details.
+│
+├─ STEP 2.5 — Ownership check per isOwner() call (two-tier cache):
+│   (Triggered by ANY(OWNER, …) in the OGNL expression — 2–4 calls per request.
+│    Note: OGNL evaluates lazily with short-circuit; ancestor IDs are not 
pre-collected.)
+│
+│   Each isOwner(principal, metalake, metadataObject) call:
+│     1. metadataId ← from metadataIdCache (Step 2, already resolved)
+│     2. requestContext.ownerCache HIT for metadataId → return immediately 
(per-request dedup)
+│     3. MISS → ownerRelCache.getIfPresent(metadataId)
+│              HIT  → put into requestContext.ownerCache; compare ownerId; 
return, 0 DB
+│              MISS → SELECT owner_id FROM owner_meta
+│                     WHERE metadata_object_id = ? AND deleted_at = 0  ← 1 
indexed query
+│                     ownerRelCache.put(metadataId, ownerId)
+│                     requestContext.ownerCache.put(metadataId, ownerId)
+│                     compare ownerId with userId → return result
+│
+│   requestContext.ownerCache: per-HTTP-request Map<Long, Optional<Long>>.
+│     Within one request the same metadataId is evaluated at most once.
+│   ownerRelCache: persistent Caffeine cache (long TTL safety-net). Local 
node: hook
+│     invalidates the specific metadataId immediately. HA peer nodes: owner 
poller queries
+│     owner_meta WHERE updated_at > maxSeen, invalidates only the changed 
entries (≤ 1 s).
+│     See §4.3 for rationale.
+│
+│   Also fixes existing bug: isOwner() currently calls 
MetadataIdConverter.getID() twice
+│   for the same object (JcasbinAuthorizer lines 224, 228). Phase 2 
consolidates to 1 call.
+│
+├─ STEP 3 — Role batch version check (1 query):
+│
+│   SELECT role_id, updated_at
+│   FROM role_meta WHERE role_id IN (?, ?, ...) AND deleted_at = 0
+│   ↑ one query validates all of the user's roles simultaneously
+│
+│   For each role where loadedRoles.get(roleId) == db.updated_at:
+│     → policy current; skip                       [C2][C3] avoided
+│
+│   For stale/cold roles (db.updated_at != cached || not in cache):
+│     → allowEnforcer.deleteRole(roleId); denyEnforcer.deleteRole(roleId)
+│     → batchListSecurableObjectsByRoleIds(staleRoleIds)  (1 query for all 
stale roles)
+│     → loadPoliciesForRoles(staleObjects)
+│     → loadedRoles.put(roleId, db.updated_at)
+│
+└─ STEP 4 — enforce() (in-memory, O(1))
+   allowEnforcer.enforce(userId, objectType, metadataId, privilege)
+   denyEnforcer.enforce(userId, objectType, metadataId, privilege)
+```
+
+---
+
+### 4.8 Properties
+
+| Dimension                              | Value                               
                                                                                
              |
+|----------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------|
+| Staleness — privilege / role changes   | **0** — version-validated on every 
request (Steps 1 + 3)                                                           
               |
+| Staleness — name→id, ownership (local) | **Immediate** — write-path hook 
fires after transaction commit                                                  
                  |
+| Staleness — name→id, ownership (HA)    | **≤ poll interval** (default **1 
s**) — mutation poller detects version advance, invalidates cache               
                 |
+| Hot path DB queries                    | **3** (Step 1a user + Step 1b 
groups + Step 3 role versions; Steps 2 and 2.5 served from cache)               
                    |
+| name→id warm                           | **0** — metadataIdCache hit         
                                                                                
              |
+| Owner check warm                       | **0** — ownerRelCache hit + 
requestContext.ownerCache dedup within request                                  
                      |
+| Cold / stale path                      | **3 + k** (k = unique 
MetadataObjects not yet in metadataIdCache or ownerRelCache)                    
                            |
+| Background threads                     | **1** lightweight mutation poller 
(single scheduled thread, 1 s interval, one tiny DB query — distinct from the 
removed executor) |
+| Failure mode                           | DB unavailable → auth blocked (same 
as today); poller retries silently                                              
              |
+| HA correctness — privilege / role      | **Fixed** — Step 1 version check 
detects any GRANT/REVOKE on all nodes immediately                               
                 |
+| HA correctness — name→id, ownership   | **Near real-time** — mutation poller 
bounds staleness to ≤ poll interval (~1 s)                                      
             |
+
+---
+
+### 4.9 Correctness Under Mutation
+
+The caches in Phase 2 fall into two consistency tiers. The analysis below 
covers both:
+
+- **Strong consistency** (Steps 1 + 3): `userRoleCache`, `groupRoleCache`, 
`loadedRoles` — version-validated on every request.
+- **Eventual consistency** (hook + change poller): `metadataIdCache`, 
`ownerRelCache` — immediate on local node via hook; HA peer nodes converge 
within the change poll interval (default 1 s) via **targeted** per-entry 
invalidation.
+
+##### Strong-Consistency Scenarios (user/group/role)
+
+| Scenario                                   | Analysis                        
                                                                                
                                                                                
             |
+|--------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| **User / Group rename**                    | `userRoleCache` keyed on 
`metalakeName:userName`. Rename → cache miss → Step 1 queries DB → correct 
result. Old key expires via TTL. **Safe.**                                      
                         |
+| **User / Group drop**                      | Step 1 returns zero rows → auth 
denied immediately. Old cache entry expires harmlessly. **Safe.**               
                                                                                
             |
+| **User / Group drop + same-name recreate** | New entity gets new `user_id` 
and `updated_at = createdTime` (distinct from old entity's value). Cached entry 
holds old `user_id` and old `updated_at` → **mismatch on next Step 1 forces 
cache refresh.** ✅ |
+
+##### Eventual-Consistency Scenarios — `metadataIdCache` (name → id)
+
+| Scenario                                      | Local node                   
                                                                                
                                                            | HA peer node 
(entity change poller, ≤ 1 s)                                                   
                                                                                
                                                                                
                   [...]
+|-----------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 [...]
+| **SecurableObject rename** (table1 → table2)  | 
`handleEntityStructuralChange(table1)` → 
`metadataIdCache.invalidate("..::TABLE::table1")` immediately. `table2` is a 
cold miss → DB → same id=100. **Immediate, safe.** | Write path inserted a row 
`(TABLE, table1, updatedTime)` into `entity_change_log`. Poller reads it → 
`metadataIdCache.invalidate("..::TABLE::table1")`. `table2` is then a cold miss 
→ DB → id=100. **Targeted, no security impact.**                                
           [...]
+| **SecurableObject drop**                      | 
`handleEntityStructuralChange` → `metadataIdCache.invalidate(key)` immediately 
→ next request: cache miss → DB "not found" → auth denied. **Immediate.**       
          | Write path inserted `(TABLE, table1, updatedTime)`. Poller fires 
within ≤ 1 s → targeted `invalidate(key)` → cache miss → DB "not found" → DENY. 
**⚠️ ≤ 1 s window** before poller fires; JCasbin may still hold policy for the 
old id. Acceptable under the agreed eventual co [...]
+| **SecurableObject drop + same-name recreate** | Hook invalidates old name on 
drop. New entity gets new id=200. Next request: cold miss → DB → id=200. No 
policy for id=200 → DENY. **Correct.**                          | Poller fires 
within ≤ 1 s on drop row → targeted invalidate. Cold miss → DB → id=200 → no 
policy → DENY. **Same ⚠️ ~1 s window as drop.**                                 
                                                                                
                      [...]
+| **SecurableObject rename + privilege check**  | Rename does not change id. 
Old name invalidated immediately by hook. New name cold miss → DB → same id. 
**Safe.**                                                        | Old name row 
logged → poller invalidates it within ≤ 1 s. New name cold miss → DB → same id. 
No policy change; Step 3 still correct. **Safe.**                               
                                                                                
                   [...]
+
+> **Note on the drop ⚠️ window**: The risk is bounded to ≤ the change poll 
interval (default 1 s). Only the specific dropped entity's cache key is 
invalidated (not `invalidateAll()`), so unaffected entries remain hot. The 
dropped entity's JCasbin policy persists until the role's `updated_at` advances 
(next GRANT/REVOKE on that role → Step 3 detects mismatch → reload → policy 
cleared).
+
+##### Eventual-Consistency Scenarios — `ownerRelCache` (metadataId → ownerId)
+
+| Scenario                                    | Local node                     
                                                                                
                                                                                
    | HA peer node (owner change poller, ≤ 1 s)                                 
                                                                                
                                                                                
              [...]
+|---------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 [...]
+| **Ownership transferred** (owner1 → owner2) | `handleMetadataOwnerChange` → 
`ownerRelCache.invalidate(metadataId)` immediately. Next `isOwner(owner1)` → 
cache miss → DB → ownerId=owner2 → false. `isOwner(owner2)` → DB → true. 
**Immediate.** | `owner_meta.updated_at` set in same transaction. Poller reads 
`WHERE updated_at > maxSeen` → targeted 
`ownerRelCache.invalidate(metadataObjectId)` within ≤ 1 s. `isOwner(owner1)` → 
cache miss → false. `isOwner(owner2)` → DB → true. **⚠️ ≤ 1 s wind [...]
+| **Owner user dropped**                      | Hook fires → ownerRelCache 
invalidated → DB → empty owner.                                                 
                                                                                
        | Stale ownerId may point to deleted user for ≤ 1 s. Deleted user can 
no longer authenticate at the API layer → no real risk.                         
                                                                                
                    [...]
+| **Owned object dropped**                    | `handleEntityStructuralChange` 
→ `metadataIdCache.invalidate(key)` → next request: `getID()` → DB entity not 
found → auth denied before `ownerRelCache` is consulted. **Immediate.**         
      | **⚠️ ≤ 1 s window** before entity change poller fires: 
`metadataIdCache` still has stale `id=100` → `isOwner()` resolves the old id → 
`ownerRelCache` may return stale ownerId → `enforce(id=100)` may ALLOW via 
orphan JCasbin policy. After entity ch [...]
+
+> **Note on ownership ⚠️ window**: Ownership transfer is an administrative 
operation (not an emergency access revocation). A ≤ 1 s window on HA peer nodes 
is operationally acceptable in all deployments. Invalidation is **targeted** — 
only the transferred entity's cache entry is evicted, leaving unrelated entries 
hot.
+
+---
+
+### 4.10 Concurrent Mutation During Auth (TOCTOU)
+
+The version check in Step 1 and the policy reload in Step 3 are not atomic 
with the
+`enforce()` call in Step 4. A concurrent write on another thread or node can 
advance a
+version counter between these steps. This section analyses the bounded impact.
+
+**Scenario A — Role revoked between Step 1 read and Step 3 policy check**
+
+```
+Thread A  Step 1: reads updated_at = T5 → matches cache → roleIds = [R1, R2]
+Thread B  commits: revokes R2 from user → user_meta.updated_at set to T6
+Thread A  Step 3: checks R1, R2 versions → both current (policies loaded) → 
skips reload
+Thread A  Step 4: enforce() sees R2 policy → may ALLOW using revoked role
+```
+
+Thread A's check reflects the DB state at the moment of Step 1. The revoke is 
visible
+on Thread A's **next** request (Step 1 reads version 6 → mismatch → reloads 
role list →
+R2 absent → JCasbin user-role mapping updated → DENY). The inconsistency 
window is
+bounded to **the duration of one auth request** (typically single-digit 
milliseconds).
+
+Strictly eliminating this window would require serializable isolation across 
the entire
+auth flow, which is impractical at auth-request frequency. The window is 
acceptable.
+
+**Scenario B — Privilege revoked during policy reload (Step 3)**
+
+```
+Thread A  Step 3: detects R1 stale (version N → N+1) → calls deleteRole(R1), 
reloads
+Thread B  commits: another revoke on R1 → version bumped to N+2
+Thread A  reads R1 securable objects from DB → may get version-N+1 snapshot
+Thread A  stores loadedRoles(R1) = N+1
+Next req  Step 3: db version = N+2, cache = N+1 → mismatch → reload again ✅
+```
+
+Thread A might serve one request using N+1 policies (missing the N+2 revoke). 
The
+**next** request detects the mismatch and reloads. Bounded to one request 
window.
+
+**Scenario C — Concurrent reload of the same role by two threads**
+
+Both threads call `deleteRole(R1)` then reload. `SyncedEnforcer` serialises 
JCasbin
+mutations; the second reload overwrites the first with the same (or newer) DB 
data.
+Final state is correct. No correctness issue.
+
+**Summary**
+
+| Race                                       | Worst-case window               
    | Acceptability                        |
+|--------------------------------------------|-------------------------------------|--------------------------------------|
+| Revoke committed after Step 1 version read | Duration of 1 auth request      
    | Acceptable — next request is correct |
+| Revoke committed during Step 3 reload      | Duration of 1 auth request      
    | Acceptable — next request is correct |
+| Concurrent reload of same role             | None (serialised by 
SyncedEnforcer) | No issue                             |
+
+The TOCTOU window is an inherent property of non-serializable reads in 
distributed systems.
+It applies only during concurrent admin mutations (which are rare in 
practice), not on
+the steady-state auth path.
+
+---
+
+### 4.11 Minimal Acceptance Test Matrix
+
+Before merge, validate the following scenarios (single-node + HA two-node 
setup):
+
+| Case                      | Setup                                   | 
Expected result                                                     |
+|---------------------------|-----------------------------------------|---------------------------------------------------------------------|
+| Revoke user role          | User has role R, then R revoked         | Next 
auth request denied on all nodes (no TTL wait)                 |
+| Revoke role privilege     | Role R loses privilege P                | Next 
auth request for P denied on all nodes                         |
+| Group role grant/revoke   | User inherits role only via group       | Auth 
reflects group mutation on next request                        |
+| Owner transfer            | owner1 -> owner2                        | Local 
node immediate switch; HA peer converges within poll interval |
+| Entity rename             | table old -> new                        | Old 
name denied after invalidation; new name resolves same ID       |
+| Entity drop               | object dropped                          | Auth 
denied after invalidation (documented bounded HA window)       |
+| Drop + recreate same name | drop old, create new                    | New ID 
used; stale policy does not authorize new entity             |
+| Poller transient failure  | pause DB access for poller then recover | No 
privilege escalation; convergence resumes after recovery         |
+
+Tests should include both functional assertions and query-count assertions on 
warm path
+(target: Step 1 + Step 3 lightweight checks, caches hot for Step 2/2.5).
+
+---
+
+## 5. Summary
+
+### 5.1 Query Count Comparison
+
+With entity cache **enabled** and all in-process caches warm, the current 
system serves auth
+from pure in-process Caffeine with **0 DB queries**. Phase 2 trades the old 
TTL-only model
+for a mixed design: strong consistency for privilege/role checks, 
near-real-time eventual
+consistency (≤ 1 s, targeted per-entry invalidation) for name→id and ownership 
lookups.
+
+| Scenario                              | Entity cache ON (current)    | Phase 
1                   | Phase 2 (warm)                                            
     |
+|---------------------------------------|------------------------------|---------------------------|----------------------------------------------------------------|
+| Hot path — privilege check            | **0** (fully in-memory)      | 3+ 
heavy full-row queries | **3 lightweight**                                      
        |
+| Hot path — owner check                | 0 (ownerRel warm)            | 1 
heavy JOIN per call     | **0** (ownerRelCache hit)                             
         |
+| Hot path — name→id                    | 0 (entity cache warm)        | 1 
full-row per call       | **0** (metadataIdCache hit)                           
         |
+| After privilege/role mutation (stale) | 0 (**stale — TTL hides it**) | 3+ 
heavy                  | **3 on next request** ✅                                
        |
+| After ownership/rename/drop (stale)   | 0 (**stale — TTL hides it**) | next 
request or TTL       | local: **immediate** (hook); HA: **≤ 1 s** (targeted 
poller)   |
+| Cold start                            | ~3+T                         | ~3+T 
heavy                | **3+k lightweight** (k = cold cache misses)              
      |
+| HA staleness — privilege/role         | Up to 1 hour                 | Up to 
1 hour              | **0** (version-validated)                                 
     |
+| HA staleness — name→id, ownership     | Up to 1 hour                 | Up to 
1 hour              | **≤ 1 s** (targeted per-entry invalidation via change 
pollers) |
+
+### 5.2 Change Surface
+
+| Dimension                        | Phase 1      | Phase 2                    
                                                                                
                                                    |
+|----------------------------------|--------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Entity cache                     | **Disabled** | Disabled                   
                                                                                
                                                    |
+| Schema — existing tables         | None         | **+4 `updated_at` 
columns**: role_meta, user_meta, group_meta (strong consistency sentinels), 
owner_meta (eventual consistency, owner change poller)           |
+| Schema — new tables              | None         | **1** (`entity_change_log` 
— append-only log of entity name changes; pruned hourly)                        
                                                    |
+| New caches (persistent)          | None         | **4** (`userRoleCache`, 
`groupRoleCache`, `metadataIdCache`, `ownerRelCache`)                           
                                                       |
+| Cache type changes               | None         | **1** (`loadedRoles` 
Boolean→Long — stores `updated_at` timestamp)                                   
                                                          |
+| Removed caches                   | None         | None (old TTL-only 
`ownerRel` replaced by `ownerRelCache` with hook + poller)                      
                                                            |
+| New per-request context fields   | None         | **2** (`metadataIdCache`, 
`ownerCache` in `AuthorizationRequestContext`)                                  
                                                     |
+| New authorizer hook              | None         | **1** 
(`handleEntityStructuralChange` — entity drop/rename invalidation)              
                                                                         |
+| Write-path additions             | None         | `updated_at` set 
(role/user/group/owner) + `entity_change_log` INSERT on rename/drop             
                                                              |
+| Background threads               | None         | **1** change poller 
(single scheduled thread, 1 s interval; runs `pollOwnerChanges` + 
`pollEntityChanges`)                                                     |
+| Invalidation granularity         | —            | Privilege/role: 
per-request `updated_at` check; name→id: targeted per-name via 
`entity_change_log`; ownership: targeted per-entity via `owner_meta.updated_at` 
|
+| Cache stampede on mutation       | —            | None — targeted 
invalidation; only the changed entries are evicted                              
                                                               |
+| External dependencies            | None         | None                       
                                                                                
                                                    |
+
+---
+
+## 6. Alternative Approaches Considered
+
+### 6.1 Can Phase 1 and Phase 2 Be Merged?
+
+Yes. If the team has capacity, Phase 1 and Phase 2 can ship together. The 
separation exists
+only if there is pressure to disable entity cache before the version 
infrastructure is ready.
+
+### 6.2 Alternative: JcasbinAuthorizer as Distributed Cache
+
+During design review, the question was raised: since auth ultimately loads 
policies into the
+`SyncedEnforcer` on each node, could we treat each `JcasbinAuthorizer` 
instance as a
+distributed cache and maintain cross-node consistency by propagating policy 
changes?
+
+**What this means in practice:** JCasbin provides a `Watcher` extension 
interface for exactly
+this purpose. When one node calls `addPolicy()` / `removePolicy()`, the 
watcher broadcasts
+the change; peer nodes receive the notification and call `loadPolicy()` to 
refresh their local
+enforcer. Mature implementations exist (`casbin-redis-watcher`, 
`casbin-kafka-watcher`, etc.).
+
+**Why it was not adopted:**
+
+| Dimension                        | JCasbin Watcher                           
             | Per-Request Version Check (chosen)           |
+|----------------------------------|--------------------------------------------------------|----------------------------------------------|
+| Consistency                      | **Eventual** — broadcast can fail or be 
lost           | **Strong** — DB validated on every request   |
+| Infrastructure                   | Requires Redis / Kafka / etc.             
             | Existing DB only                             |
+| Cold start / node restart        | Must reload full policy from DB 
regardless             | Handled naturally; load on first access      |
+| Write-path cost                  | Each write triggers full `loadPolicy()` 
on all peers   | No cross-node cost; each node updates lazily |
+| Broadcast failure window         | Unbounded until TTL expiry                
             | Not applicable — no broadcast                |
+| `UpdatableWatcher` (incremental) | Reduces reload cost but adds 
implementation complexity | N/A                                          |
+
+**Core problem with the push model:** JCasbin's default `loadPolicy()` is a 
full reload — every
+privilege change causes every peer node to re-fetch all policies from the DB. 
At scale
+(many roles × many securable objects) this is prohibitively expensive. 
`UpdatableWatcher`
+supports incremental updates but its implementation complexity converges 
toward reinventing
+per-request version check while still requiring an external message broker.
+
+**Key insight:** The current design already treats `SyncedEnforcer` as a local 
cache. The
+`loadedRoles`, `userRoleCache`, and `groupRoleCache` caches manage its 
policy-loading
+lifecycle; version numbers decide when to invalidate. The difference from the 
Watcher approach
+is **push vs. pull** — and pull against the existing DB achieves strong 
consistency without
+any additional infrastructure.
+
+**Potential future hybrid:** If the 3 per-request DB queries become a 
bottleneck, a Watcher
+could be added as an **optimistic hint** layer (early notification → skip the 
version-check
+queries on likely-clean requests). The per-request version check must be 
retained as the
+correctness guarantee. This is out of scope for the current phases.
+
+### 6.3 Possible Future Direction: Auth Decision Cache
+
+Not on the current roadmap. Once Phase 2 is stable, caching the final auth 
decision
+`(userId, objectType, metadataId, privilege) → ALLOW|DENY` would reduce the 
hot path to
+zero DB queries. The prerequisite is Phase 2's version-check infrastructure.
+
+---
+
+## 7. Implementation Details
+
+### 7.1 Mapper Additions
+
+```java
+// ── Result types (Java records) ────────────────────────────────────────────
+
+/** Step 1a: user identity + role-list staleness sentinel. */
+record UserAuthInfo(long userId, long updatedAt) {}
+
+/** Step 1b: one row per group the user belongs to. */
+record GroupAuthInfo(long groupId, long updatedAt) {}
+
+/** Step 2.5: owner identity for a single metadata object. */
+record OwnerInfo(long ownerId, String ownerType) {}
+
+/** Step 3: role version sentinel returned by batch query. */
+record RoleUpdatedAt(long roleId, long updatedAt) {}
+
+/** Owner change poller result — one row per changed owner_meta entry. */
+record ChangedOwnerInfo(long metadataObjectId, long updatedAt) {}
+
+/** Entity change poller result — one row per entity_change_log entry. */
+record EntityChangeRecord(
+    String metalakeName,
+    String entityType,
+    String fullName,
+    String operateType,   // "DROP" | "CREATE" | "ALTER"
+    long   createdAt
+) {}
+
+// ── Mapper interfaces ───────────────────────────────────────────────────────
+
+// RoleMetaMapper.java
+void                touchUpdatedAt(@Param("roleId") long roleId, @Param("now") 
long now);
+List<RoleUpdatedAt> batchGetUpdatedAt(@Param("roleIds") List<Long> roleIds);
+
+// UserMetaMapper.java
+void         touchUpdatedAt(@Param("userId") long userId, @Param("now") long 
now);
+UserAuthInfo getUserInfo(
+    @Param("metalakeName") String metalakeName, @Param("userName") String 
userName);
+
+// GroupMetaMapper.java
+void               touchUpdatedAt(@Param("groupId") long groupId, 
@Param("now") long now);
+List<GroupAuthInfo> getGroupInfoByUserId(@Param("userId") long userId);
+
+// OwnerMetaMapper.java
+// Step 2.5 — single ownership lookup, deduped by requestContext
+OwnerInfo          selectOwnerByMetadataObjectId(@Param("metadataObjectId") 
long metadataObjectId);
+// Owner change poller — returns entries changed since the last poll
+List<ChangedOwnerInfo> selectChangedOwners(@Param("updatedAtAfter") long 
updatedAtAfter);
+
+// EntityChangeLogMapper.java
+List<EntityChangeRecord> selectChanges(
+    @Param("createdAtAfter") long createdAtAfter,
+    @Param("maxRows")        int  maxRows);
+void insertChange(
+    @Param("metalakeName") String metalakeName,
+    @Param("entityType")   String entityType,
+    @Param("fullName")     String fullName,
+    @Param("operateType")  String operateType,
+    @Param("createdAt")    long   createdAt);
+void pruneOldEntries(@Param("before") long before);
+```
+
+```xml
+<!-- Step 1a: user staleness check, resolves metalake inline -->
+<select id="getUserInfo" resultType="map">
+  SELECT um.user_id, um.updated_at
+  FROM user_meta um
+  JOIN metalake_meta mm ON um.metalake_id = mm.metalake_id AND mm.deleted_at = 0
+  WHERE mm.metalake_name = #{metalakeName} AND um.user_name = #{userName}
+  AND um.deleted_at = 0
+</select>
+
+<!-- Step 1b: group staleness check, returns all groups the user belongs to -->
+<select id="getGroupInfoByUserId" resultType="map">
+  SELECT gm.group_id, gm.updated_at
+  FROM group_meta gm
+  JOIN group_user_rel gu ON gm.group_id = gu.group_id AND gu.deleted_at = 0
+  WHERE gu.user_id = #{userId} AND gm.deleted_at = 0
+</select>
+
+<!-- Step 2.5: single ownership query, called per isOwner(); deduped by 
requestContext cache -->
+<select id="selectOwnerByMetadataObjectId" resultType="map">
+  SELECT owner_id, owner_type FROM owner_meta
+  WHERE metadata_object_id = #{metadataObjectId} AND deleted_at = 0
+</select>
+
+<!-- Step 3: batch staleness check for roles -->
+<select id="batchGetUpdatedAt" resultType="map">
+  SELECT role_id, updated_at FROM role_meta
+  WHERE role_id IN
+  <foreach item="id" collection="roleIds" open="(" separator="," 
close=")">#{id}</foreach>
+  AND deleted_at = 0
+</select>
+
+<update id="touchUpdatedAt">
+  UPDATE role_meta SET updated_at = #{now}
+  WHERE role_id = #{roleId}
+</update>
+
+<!-- Owner change poller: returns metadata_object_ids changed after a given 
time -->
+<select id="selectChangedOwners" resultType="map">
+  SELECT metadata_object_id, updated_at
+  FROM owner_meta
+  WHERE updated_at > #{updatedAtAfter}
+  ORDER BY updated_at
+</select>
+
+<!-- Entity change log: returns affected names (for metadataIdCache targeted 
invalidation) -->
+<select id="selectChanges" resultType="map">
+  SELECT metalake_name, entity_type, full_name, operate_type, created_at
+  FROM entity_change_log
+  WHERE created_at > #{createdAtAfter}
+  ORDER BY created_at
+  LIMIT #{maxRows}
+</select>
+
+<insert id="insertChange">
+  INSERT INTO entity_change_log
+    (metalake_name, entity_type, full_name, operate_type, created_at)
+  VALUES
+    (#{metalakeName}, #{entityType}, #{fullName}, #{operateType}, #{createdAt})
+</insert>
+
+<!-- Prune old entries to bound table size; safe to run even on active nodes 
-->
+<delete id="pruneOldEntries">
+  DELETE FROM entity_change_log WHERE created_at &lt; #{before} LIMIT 1000
+</delete>
+```
+
+### 7.2 Write Path Changes
+
+**`RoleMetaService` — privilege change (grant or revoke):**
+```java
+long now = System.currentTimeMillis();
+SessionUtils.doMultipleWithCommit(
+    () -> securableObjectMapper.softDeleteSecurableObjects(roleId, ...),  // 
existing
+    () -> securableObjectMapper.insertSecurableObjects(newObjects),        // 
existing
+    () -> roleMetaMapper.touchUpdatedAt(roleId, now)                      // 
NEW, same tx
+);
+```
+
+**`UserMetaService` — role assignment change:**
+```java
+long now = System.currentTimeMillis();
+SessionUtils.doMultipleWithCommit(
+    () -> userRoleRelMapper.softDeleteUserRoleRel(userId, roleIds),  // 
existing
+    () -> userRoleRelMapper.insertUserRoleRels(newRelations),        // 
existing
+    () -> userMetaMapper.touchUpdatedAt(userId, now)                 // NEW, 
same tx
+);
+```
+
+**`OwnerMetaService` — ownership transfer:**
+
+The transaction sets `owner_meta.updated_at = currentTimeMillis()` on the 
affected row
+(same transaction as the owner change). The existing 
`handleMetadataOwnerChange()` hook fires
+after commit for immediate local-node invalidation:
+
+```java
+long now = System.currentTimeMillis();
+SessionUtils.doMultipleWithCommit(
+    () -> ownerMetaMapper.softDeleteOwner(metadataObjectId),                   
     // existing
+    () -> ownerMetaMapper.insertOwnerWithUpdatedAt(metadataObjectId, 
newOwnerId, now) // NEW, same tx
+    // The new row carries updated_at = now; the owner poller picks it up on 
HA peers.
+);
+// After commit — immediate local invalidation
+authorizer.handleMetadataOwnerChange(metalake, oldOwnerId, nameIdentifier, 
type);
+
+// JcasbinAuthorizer — wire it to ownerRelCache
+@Override
+public void handleMetadataOwnerChange(String metalake, Long oldOwnerId,
+                                       NameIdentifier nameIdentifier, 
Entity.EntityType type) {
+    Long metadataId = MetadataIdConverter.getID(  // metadataIdCache hit, 0 DB
+        NameIdentifierUtil.toMetadataObject(nameIdentifier, type), metalake);
+    ownerRelCache.invalidate(metadataId);   // immediate on local node
+    // HA peer nodes: owner poller detects updated_at advance → targeted 
invalidate() ≤ 1 s
+}
+```
+
+**Entity rename/drop paths — new `handleEntityStructuralChange()` hook:**
+
+The entity MetaService classes (`CatalogMetaService`, `SchemaMetaService`, 
`TableMetaService`,
+and analogous for FILESET, TOPIC, VIEW, MODEL) must:
+1. INSERT a row into `entity_change_log` with the **old name** (rename) or 
name (drop)
+   **in the same transaction** as the data change.
+2. Call `handleEntityStructuralChange()` **after commit** for immediate 
local-node invalidation.
+
+```java
+// In entity rename write path (example: CatalogMetaService)
+String oldFullName = catalog.name();   // captured BEFORE the rename executes
+long   now = System.currentTimeMillis();
+SessionUtils.doMultipleWithCommit(
+    () -> catalogMapper.updateCatalog(catalogId, newName, ...),                
  // existing
+    () -> entityChangeLogMapper.insertChange(                                  
   // NEW, same tx
+              metalakeName, "CATALOG", oldFullName, "ALTER", now)
+    // Stores the OLD name + operate_type=ALTER. On DELETE cascade child 
records
+    // in SCHEMA/TABLE etc. do NOT need separate rows — prefix invalidation 
covers them.
+);
+// After commit — immediate local prefix invalidation of old key + all children
+authorizer.handleEntityStructuralChange(metalake, oldNameIdentifier, 
EntityType.CATALOG);
+
+// In entity drop write path (example: CatalogMetaService)
+long now2 = System.currentTimeMillis();
+SessionUtils.doMultipleWithCommit(
+    () -> catalogMapper.softDeleteCatalog(catalogId),                          
  // existing
+    () -> entityChangeLogMapper.insertChange(                                  
   // NEW, same tx
+              metalakeName, "CATALOG", catalogFullName, "DROP", now2)
+    // One row for the catalog; prefix invalidation on HA peers cascades to 
all children.
+);
+authorizer.handleEntityStructuralChange(metalake, nameIdentifier, 
EntityType.CATALOG);
+
+// GravitinoAuthorizer (new method)
+void handleEntityStructuralChange(String metalake, NameIdentifier nameIdent, 
Entity.EntityType type);
+
+// JcasbinAuthorizer implementation
+@Override
+public void handleEntityStructuralChange(String metalake, NameIdentifier 
nameIdent,
+                                          Entity.EntityType type) {
+    // Build hierarchical cache key; non-leaf types get trailing "::" for 
prefix cascade.
+    String cacheKey = buildCacheKey(metalake, type, nameIdent.toString());
+    metadataIdCache.invalidateByPrefix(cacheKey);  // immediate on local node;
+    // works for both leaf (exact match) and non-leaf (cascades all children 
under prefix)
+    // HA peer nodes: entity change poller reads entity_change_log WHERE 
created_at > maxSeen
+    // → buildCacheKey + invalidateByPrefix for each changed entry ≤ 1 s
+}
+
+/** Build a hierarchical metadataIdCache key from its components.
+ *  Non-leaf entities (METALAKE, CATALOG, SCHEMA) end with "::" so that
+ *  invalidateByPrefix cascades to all child entities in one call.
+ *  Leaf entities (TABLE, FILESET, TOPIC, VIEW, MODEL) append "::TYPE" for 
disambiguation.
+ *
+ *  Examples:
+ *    (lake1, CATALOG, "cat1")           → "lake1::cat1::"
+ *    (lake1, SCHEMA,  "cat1.schema1")   → "lake1::cat1::schema1::"
+ *    (lake1, TABLE,   "cat1.schema1.t1")→ "lake1::cat1::schema1::t1::TABLE"
+ */
+static String buildCacheKey(String metalake, Entity.EntityType type, String 
fullName) {
+    StringBuilder sb = new StringBuilder(metalake);
+    for (String part : fullName.split("\\.")) {
+        sb.append("::").append(part);
+    }
+    if (isNonLeaf(type)) {
+        sb.append("::");        // trailing :: enables prefix cascade for all 
children
+    } else {
+        sb.append("::").append(type.name());  // ::TABLE, ::FILESET, etc.
+    }
+    return sb.toString();
+}
+
+private static boolean isNonLeaf(Entity.EntityType type) {
+    return type == Entity.EntityType.METALAKE
+        || type == Entity.EntityType.CATALOG
+        || type == Entity.EntityType.SCHEMA;
+}
+```
+
+**Change Poller Implementation (JcasbinAuthorizer):**
+
+```java
+private void startChangePoller(long intervalMs) {
+    changePoller = Executors.newSingleThreadScheduledExecutor(
+        r -> new Thread(r, "gravitino-change-poller"));
+    changePoller.scheduleAtFixedRate(
+        this::pollChanges, intervalMs, intervalMs, TimeUnit.MILLISECONDS);
+}
+
+/** Two targeted poll queries per interval — owner changes and entity name 
changes. */
+private void pollChanges() {
+    pollOwnerChanges();
+    pollEntityChanges();
+}
+
+private void pollOwnerChanges() {
+    try {
+        long since = maxOwnerUpdatedAt.get();
+        List<ChangedOwnerInfo> rows = 
ownerMetaMapper.selectChangedOwners(since);
+        long maxAt = since;
+        for (ChangedOwnerInfo row : rows) {
+            ownerRelCache.invalidate(row.metadataObjectId());  // targeted: 
only this entry
+            if (row.updatedAt() > maxAt) maxAt = row.updatedAt();
+        }
+        maxOwnerUpdatedAt.set(maxAt);
+    } catch (Exception e) {
+        LOG.warn("Owner change poller failed, will retry", e);
+    }
+}
+
+private void pollEntityChanges() {
+    try {
+        long since = maxEntityCreatedAt.get();
+        List<EntityChangeRecord> rows = 
entityChangeLogMapper.selectChanges(since, 1000);
+        long maxAt = since;
+        for (EntityChangeRecord row : rows) {
+            Entity.EntityType type = 
Entity.EntityType.valueOf(row.entityType());
+            String cacheKey = buildCacheKey(row.metalakeName(), type, 
row.fullName());
+            // invalidateByPrefix handles both leaf (exact match) and non-leaf 
(cascade):
+            //   DROP/ALTER on CATALOG "cat1" → prefix "lake::cat1::" evicts 
catalog + all children
+            //   DROP on TABLE "cat1.s1.t1"   → prefix 
"lake::cat1::s1::t1::TABLE" evicts exactly one entry
+            metadataIdCache.invalidateByPrefix(cacheKey);
+            if (row.createdAt() > maxAt) maxAt = row.createdAt();
+        }
+        maxEntityCreatedAt.set(maxAt);
+    } catch (Exception e) {
+        LOG.warn("Entity change poller failed, will retry", e);
+    }
+}
+
+@Override
+public void close() {
+    if (changePoller != null) {
+        changePoller.shutdownNow();
+    }
+    // ... other cleanup
+}
+```
+
+**entity_change_log pruning:** A low-priority task (e.g. once per hour) calls
+`entityChangeLogMapper.pruneOldEntries(System.currentTimeMillis() - 
RETENTION_MS)` to keep
+the table small. The retention window must be longer than the poll interval by 
a safe margin
+(default: 1 hour retention, 1 s poll interval). Pruning is idempotent and can 
run on any node.
+
+All schema writes (`updated_at` set, `entity_change_log` INSERT) happen in
+**the same DB transaction** as the data change. If the transaction rolls back, 
none of the
+tracking writes are committed — no spurious cache invalidations on HA peers.
+
+### 7.3 GravitinoCache Interface
+
+```java
+public interface GravitinoCache<K, V> extends Closeable {
+    Optional<V> getIfPresent(K key);
+    void put(K key, V value);
+    void invalidate(K key);
+    void invalidateAll();
+    /** Evict all entries whose key starts with the given prefix string.
+     *  Only applicable when K = String. Used by metadataIdCache for cascade
+     *  invalidation: dropping a catalog evicts the catalog entry plus all
+     *  schema/table/fileset/... entries beneath it in one call. */
+    void invalidateByPrefix(String prefix);
+    long size();
+}
+```
+
+`CaffeineGravitinoCache<K,V>` — wraps Caffeine with configurable TTL and max 
size.
+`invalidateByPrefix` iterates `cache.asMap().keySet()` and evicts matching 
entries
+(`O(n)` over cache size, which is bounded and DDL is rare):
+```java
+@Override
+public void invalidateByPrefix(String prefix) {
+    cache.asMap().keySet().stream()
+         .filter(k -> ((String) k).startsWith(prefix))
+         .forEach(cache::invalidate);
+}
+```
+`NoOpsGravitinoCache<K,V>` — no-op implementation for tests.
+
+---
+
+## 8. Phased Implementation Plan
+
+### Phase 1 — Foundation (common to both approaches, no schema changes)
+
+| Step  | Change                                                               
                | Module                                               |
+|-------|--------------------------------------------------------------------------------------|------------------------------------------------------|
+| 1.1   | Fix auth N+1: `batchListSecurableObjectsByRoleIds()` + rewrite 
`loadRolePrivilege()` | `RoleMetaService`, `JcasbinAuthorizer`               |
+| 1.2   | Introduce `GravitinoCache<K,V>` interface; wrap existing Caffeine 
caches             | `GravitinoCache.java`, `CaffeineGravitinoCache.java` |
+| 1.3   | Disable entity store cache: `CACHE_ENABLED` default → `false`        
                | `Configs.java`                                       |
+| 1.4   | Fix DBCP2: `minEvictableIdleTimeMillis` 1 s → 30 s; `minIdle` 0 → 5  
                | `SqlSessionFactoryHelper.java`                       |
+
+**Outcome:** Auth cold path `3+T` queries. Calls [A][C1] hit DB every request 
(acceptable
+as a stepping stone — Phase 2 closes this). Consistency still TTL-bounded.
+
+---
+
+### Phase 2 — Version-Validated Auth Cache Implementation
+
+| Step  | Change                                                               
                                                                                
                                                                                
                                                                                
                                          | Module                              
                                                            |
+|-------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------|
+| 2.1   | `ADD COLUMN updated_at` on `role_meta`                               
                                                                                
                                                                                
                                                                                
                                          | `schema-x.y.z-*.sql`                
                                                            |
+| 2.2   | `ADD COLUMN updated_at` on `user_meta`                               
                                                                                
                                                                                
                                                                                
                                          | `schema-x.y.z-*.sql`                
                                                            |
+| 2.3   | `ADD COLUMN updated_at` on `group_meta`                              
                                                                                
                                                                                
                                                                                
                                          | `schema-x.y.z-*.sql`                
                                                            |
+| 2.4   | Set `role_meta.updated_at = now()` in privilege grant/revoke 
transaction                                                                     
                                                                                
                                                                                
                                                  | `RoleMetaService`           
                                                                    |
+| 2.5   | Set `user_meta.updated_at = now()` and `group_meta.updated_at = 
now()` in role assign/revoke transaction                                        
                                                                                
                                                                                
                                               | `UserMetaService`, 
`GroupMetaService`                                                           |
+| 2.6   | Add `userRoleCache: GravitinoCache<String, CachedUserRoles>`         
                                                                                
                                                                                
                                                                                
                                          | `JcasbinAuthorizer`                 
                                                            |
+| 2.7   | Add `groupRoleCache: GravitinoCache<String, CachedGroupRoles>`; 
implement group role loading in `loadRolePrivilege()` (currently missing 
entirely)                                                                       
                                                                                
                                                      | `JcasbinAuthorizer`     
                                                                        |
+| 2.8   | Change `loadedRoles` type: `Boolean` → `Integer` (stores version)    
                                                                                
                                                                                
                                                                                
                                          | `JcasbinAuthorizer`                 
                                                            |
+| 2.9   | Rewrite `loadRolePrivilege()` + `authorize()` with 4-step flow 
(§4.7); remove `executor` thread pool (replaced by batch query)                 
                                                                                
                                                                                
                                                | `JcasbinAuthorizer`           
                                                                  |
+| 2.10  | Add mapper methods (see §7.1)                                        
                                                                                
                                                                                
                                                                                
                                          | mapper + SQL                        
                                                            |
+| 2.11  | Replace `ownerRel` (TTL-only) with `ownerRelCache` (TTL safety-net + 
`handleMetadataOwnerChange()` hook); add `ownerCache: Map<Long, 
Optional<Long>>` to `AuthorizationRequestContext` for per-request dedup; fix 
double `getID()` call in `isOwner()`                                            
                                                             | 
`JcasbinAuthorizer`, `AuthorizationRequestContext`                              
                |
+| 2.12  | Add `metadataIdCache: GravitinoCache<String, Long>` (TTL safety-net 
+ `handleEntityStructuralChange()` hook); add `metadataIdCache: Map<String, 
Long>` to `AuthorizationRequestContext` for per-request dedup                   
                                                                                
                                               | `JcasbinAuthorizer`, 
`AuthorizationRequestContext`, `GravitinoAuthorizer`                       |
+| 2.13  | Wire `handleEntityStructuralChange()` into entity drop/rename write 
paths; INSERT into `entity_change_log` (old name on rename, name on drop) in 
the same transaction                                                            
                                                                                
                                              | `CatalogMetaService`, 
`SchemaMetaService`, `TableMetaService`, (FILESET / TOPIC / VIEW / MODEL) |
+| 2.14  | Extend `AuthorizationRequestContext.loadRole()` guard to cover both 
user and group role loading in one pass                                         
                                                                                
                                                                                
                                           | `AuthorizationRequestContext`      
                                                             |
+| 2.15  | Add `updated_at` column to `owner_meta`; add 
`OwnerMetaMapper.selectChangedOwners(updatedAtAfter)`                           
                                                                                
                                                                                
                                                                  | 
`schema-x.y.z-*.sql`, mapper + SQL                                              
                |
+| 2.16  | Create `entity_change_log` table; add `EntityChangeLogMapper` 
(`selectChanges(createdAtAfter)`, `insertChange(metalakeName, entityType, 
fullName, operateType, createdAt)`, `pruneOldEntries(before)`); add 
`buildCacheKey()` + `isNonLeaf()` helpers; add `invalidateByPrefix()` to 
`GravitinoCache`                                                          | 
`schema-x.y.z-*.sql`, mapper + SQL, `JcasbinAuthorizer`, `GravitinoCache`       
                |
+| 2.17  | Wire `entity_change_log` INSERT into entity rename/drop write paths 
(old name + `operate_type=ALTER/DROP`); one row per operation regardless of 
subtree size                                                                    
                                                                                
                                               | All entity MetaService classes 
                                                                 |
+| 2.18  | Implement change poller in `JcasbinAuthorizer`: `maxOwnerUpdatedAt`, 
`maxEntityCreatedAt`, `startChangePoller()` / `pollOwnerChanges()` / 
`pollEntityChanges()` / `close()`                                               
                                                                                
                                                     | `JcasbinAuthorizer`      
                                                                       |
+| 2.19  | **Performance validation**: run auth hot-path benchmark before and 
after Phase 2; confirm ≤ 3 DB queries per `authorize()` call under cache-hit 
conditions; measure p50/p99 latency with concurrent auth load (≥ 100 threads); 
verify `metadataIdCache` and `ownerRelCache` hit rates ≥ 95% after warm-up; 
confirm poller CPU overhead < 1% under steady state | `JcasbinAuthorizer` 
benchmark / JMH or load test                                                |
+
+**Outcome:** Privilege/role changes: 0 staleness (version-validated). Name→id 
and ownership: eventual consistency, ≤ poll interval (~1 s) on HA nodes via 
targeted invalidation, immediate on local node via hooks. Hot path: **3 DB 
queries** (Steps 1a + 1b + 3; Steps 2 and 2.5 served from Caffeine cache).

(gravitino) branch main updated: [#10695] feat(cache): Introduce Gravitino Cache Improvement Design document outlining new caching strategies and architecture (#10696)

Reply via email to