Hi all,

  As Gravitino evolves toward distributed multi-node deployment, I'd like
to open a discussion on how to handle cache consistency across nodes.

  *Background*

  Gravitino currently uses local Caffeine caches for metadata. In a
single-node setup this works well, but in a multi-node active-active
deployment, a write on Node A leaves Node B's cache stale.
  With no invalidation mechanism, the only bound on staleness is the cache
TTL — which can be tens of seconds or more.

  *Goals*

  - Convergence within ≤ 2 seconds for metadata caches under normal
conditions.
  - Stricter handling for authorization caches (roles, privileges) where
staleness is a security concern.
  - No mandatory new external dependencies (H2/MySQL/PostgreSQL all
supported; Apache project principle).


*  Proposed Approach*
  After surveying how similar systems handle this (Keycloak, Project
Nessie, Trino, Etcd, HMS, and others), I am proposing a three-layer design:


┌────────────┬──────────────────────────────────┬─────────────────────────────────────────────────────┐
  │   Layer                   │            Mechanism
                                 │                                  Role

              │

├────────────┼──────────────────────────────────┼─────────────────────────────────────────────────────┤
  │ Fast path               │ HTTP fan-out on write
                           │ Near-instant invalidation for online nodes
                                                                 │

├────────────┼──────────────────────────────────┼─────────────────────────────────────────────────────┤
  │ Safety net              │ DB global version polling (~1 s)
                      │ Catch-up for restarted or temporarily offline
nodes                                                       │

├────────────┼──────────────────────────────────┼─────────────────────────────────────────────────────┤
  │ Hard bound           │ Short TTL on all cache entries
                     │ Unconditional worst-case staleness cap
                                                         │

└────────────┴──────────────────────────────────┴─────────────────────────────────────────────────────┘

  This requires zero new infrastructure. The design also defines a
CacheInvalidationTransport SPI for deployments that want stronger delivery
guarantees — with JGroups (embedded, AP model, no
  external service) as the recommended optional implementation, and Redis
Pub/Sub/Streams as an alternative for deployments already running Redis.

  The full analysis — including option comparisons, architecture diagrams,
industry references, and a phased implementation plan — is available here:


https://docs.google.com/document/d/18BI8dlcrs9nIHF1l7zH17L1YACA1erziBGkzWgWcK5k/edit?tab=t.0

I'd appreciate the community's input on the following open questions:

  1. *Overall approach* — Does the three-layer design (fast-path
invalidation + DB version polling as safety net + short TTL as hard bound)
look like the right direction? Are there failure modes or
  deployment scenarios we have not considered?
  2. *External middleware policy* — The current proposal avoids mandatory
external dependencies (no Redis, ZooKeeper, Etcd, etc.) to keep Gravitino
self-contained. Should we reconsider this? Some
  production deployments likely already run Redis or an Etcd cluster;
allowing (or recommending) an external message bus could simplify the
implementation considerably.
  3. *Strong consistency requirement* — The proposal targets eventual
consistency with a short convergence window (≤ 2 s). Is that acceptable for
all cache classes? In particular, should
  authorization caches (roles, privileges, ownership) be held to a stricter
standard — e.g., read-through on every request with no local caching — even
at the cost of higher DB load?
  4. *Pluggable strategy* — Should Gravitino expose a
CacheInvalidationTransport SPI so that operators can choose their own
consistency mechanism (HTTP fan-out, JGroups, Redis, etc.) rather than the
project committing to a single built-in solution? Or does that add too much
complexity for users?
  5. *Anything we missed* — Are there constraints, use cases, or prior art
in the Gravitino ecosystem that should influence this design?

  Looking forward to the discussion.

  Thanks,
  Qi Yu

Reply via email to