924060929 opened a new pull request, #63612:
URL: https://github.com/apache/doris/pull/63612

   ## Proposed changes
   
   Follower FEs replay metadata changes (e.g. `ALTER TABLE ... RENAME COLUMN`,
   alter table properties, etc.) but the replay paths never called
   `NereidsSqlCacheManager.invalidateAboutTable()`, so stale sql-cache entries
   survived and queries on follower FEs returned wrong results — old column
   names, old schema, etc.
   
   Since `enable_sql_cache` defaults to `true` since 4.0, every multi-FE
   deployment is affected.
   
   ### Root cause
   
   On the master FE, DDL operations invalidate the sql cache directly (e.g.
   `Alter.java` calls `sqlCacheManager.invalidateAboutTable()`). But on
   follower FEs, the replay paths (`Env.setTableStatusInternal`,
   `Env.setReplicaVersionInternal`, `Env.setPartitionVersionInternal`, and
   various `replay*` methods in `EditLog`) only mutate metadata — they never
   fan out cache invalidation signals.
   
   ### Fix
   
   Introduce `OP_TABLE_META_CHANGE` — the master emits this journal entry once
   per table metadata mutation via `Env.notifyTableMetaChange(table)`, and
   every follower fans it out to `NereidsSqlCacheManager` +
   `NereidsSortedPartitionsCacheManager` on replay. This decouples cache
   invalidation from the dozens of individual replay paths: adding a new
   per-table cache only requires one line in `Env.fanOutTableMetaChange()`.
   
   ### Changes
   
   | File | Change |
   |------|--------|
   | `TableMetaChange.java` | **New** — persist payload carrying 
catalog/db/table id+name |
   | `OperationType.java` | Add `OP_TABLE_META_CHANGE = 1102` |
   | `JournalEntity.java` | Deserialize `TableMetaChange` |
   | `EditLog.java` | Dispatch replay + `logTableMetaChange()` helper |
   | `Env.java` | `notifyTableMetaChange` / `replayTableMetaChange` / 
`fanOutTableMetaChange`; 3 call sites add `!isReplay` guard |
   | `Alter.java` | Route through `Env.notifyTableMetaChange()` instead of 
direct cache call |
   | `NereidsSqlCacheManager.java` | Accept `TableMetaChange` payload; 
null-safe iteration over softValues |
   
   ## Further comments
   
   Verified on both local 3-FE cluster and cloud 3-FE + MetaService + FDB 
cluster:
   - `ALTER TABLE RENAME COLUMN` on master → follower query with old column name
     correctly fails (`Unknown column`) instead of returning stale cached 
result.
   
   Performance note: `invalidateAboutTable` does an O(N) scan of the sql cache.
   For the default `sql_cache_manage_num=100000` this takes ~10ms per DDL, which
   is negligible given DDL frequency. A reverse-index optimization was explored
   but deferred — maintaining a secondary index on a soft-referenced Caffeine
   cache without global locking proved impractical without introducing race
   conditions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to