924060929 opened a new pull request, #63612:
URL: https://github.com/apache/doris/pull/63612
## Proposed changes
Follower FEs replay metadata changes (e.g. `ALTER TABLE ... RENAME COLUMN`,
alter table properties, etc.) but the replay paths never called
`NereidsSqlCacheManager.invalidateAboutTable()`, so stale sql-cache entries
survived and queries on follower FEs returned wrong results — old column
names, old schema, etc.
Since `enable_sql_cache` defaults to `true` since 4.0, every multi-FE
deployment is affected.
### Root cause
On the master FE, DDL operations invalidate the sql cache directly (e.g.
`Alter.java` calls `sqlCacheManager.invalidateAboutTable()`). But on
follower FEs, the replay paths (`Env.setTableStatusInternal`,
`Env.setReplicaVersionInternal`, `Env.setPartitionVersionInternal`, and
various `replay*` methods in `EditLog`) only mutate metadata — they never
fan out cache invalidation signals.
### Fix
Introduce `OP_TABLE_META_CHANGE` — the master emits this journal entry once
per table metadata mutation via `Env.notifyTableMetaChange(table)`, and
every follower fans it out to `NereidsSqlCacheManager` +
`NereidsSortedPartitionsCacheManager` on replay. This decouples cache
invalidation from the dozens of individual replay paths: adding a new
per-table cache only requires one line in `Env.fanOutTableMetaChange()`.
### Changes
| File | Change |
|------|--------|
| `TableMetaChange.java` | **New** — persist payload carrying
catalog/db/table id+name |
| `OperationType.java` | Add `OP_TABLE_META_CHANGE = 1102` |
| `JournalEntity.java` | Deserialize `TableMetaChange` |
| `EditLog.java` | Dispatch replay + `logTableMetaChange()` helper |
| `Env.java` | `notifyTableMetaChange` / `replayTableMetaChange` /
`fanOutTableMetaChange`; 3 call sites add `!isReplay` guard |
| `Alter.java` | Route through `Env.notifyTableMetaChange()` instead of
direct cache call |
| `NereidsSqlCacheManager.java` | Accept `TableMetaChange` payload;
null-safe iteration over softValues |
## Further comments
Verified on both local 3-FE cluster and cloud 3-FE + MetaService + FDB
cluster:
- `ALTER TABLE RENAME COLUMN` on master → follower query with old column name
correctly fails (`Unknown column`) instead of returning stale cached
result.
Performance note: `invalidateAboutTable` does an O(N) scan of the sql cache.
For the default `sql_cache_manage_num=100000` this takes ~10ms per DDL, which
is negligible given DDL frequency. A reverse-index optimization was explored
but deferred — maintaining a secondary index on a soft-referenced Caffeine
cache without global locking proved impractical without introducing race
conditions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]