morrySnow commented on code in PR #64160:
URL: https://github.com/apache/doris/pull/64160#discussion_r3425336282
##########
fe/fe-core/src/main/java/org/apache/doris/datasource/ExternalMetaCacheMgr.java:
##########
@@ -242,12 +245,34 @@ public void removeCatalogByEngine(long catalogId, String
engine) {
public void invalidateDb(long catalogId, String dbName) {
routeCatalogEngines(catalogId, cache -> safeInvalidate(
cache, catalogId, "invalidateDb", () ->
cache.invalidateDb(catalogId, dbName)));
+ CatalogIf<?> catalog = getCatalog(catalogId);
+ if (catalog != null) {
+ rowCountCache.invalidateDb(catalogId,
Util.genIdByName(catalog.getName(), dbName));
+ }
}
public void invalidateTable(long catalogId, String dbName, String
tableName) {
routeCatalogEngines(catalogId, cache -> safeInvalidate(
cache, catalogId, "invalidateTable",
() -> cache.invalidateTable(catalogId, dbName, tableName)));
+ CatalogIf<?> catalog = getCatalog(catalogId);
+ if (catalog != null) {
+ rowCountCache.invalidateTable(catalogId,
+ Util.genIdByName(catalog.getName(), dbName),
+ Util.genIdByName(catalog.getName(), dbName, tableName));
+ }
+ }
+
+ public void invalidateTable(ExternalTable dorisTable) {
+ invalidateTable(dorisTable.getCatalog().getId(),
dorisTable.getDbName(), dorisTable.getName());
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("invalid table cache for {}.{} in catalog {}",
dorisTable.getRemoteDbName(),
+ dorisTable.getRemoteName(),
dorisTable.getCatalog().getName());
+ }
+ }
+
+ public void invalidateTableRowCountCache(ExternalTable dorisTable) {
+ rowCountCache.invalidateTable(dorisTable.getCatalog().getId(),
dorisTable.getDbId(), dorisTable.getId());
}
Review Comment:
The `LOG.debug` in `invalidateTable(ExternalTable)` uses
`dorisTable.getRemoteDbName()` and `dorisTable.getRemoteName()` for the log
message, while `invalidateTable(long catalogId, String dbName, String
tableName)` receives `dorisTable.getDbName()` and `dorisTable.getName()` (the
Doris-level names). The debug log describes the remote names but the
invalidation operates on Doris-level names — not a bug but potentially
confusing when debugging. Consider logging the Doris-level names too, or noting
the distinction.
##########
fe/fe-core/src/main/java/org/apache/doris/datasource/ExternalMetaCacheMgr.java:
##########
@@ -242,12 +245,34 @@ public void removeCatalogByEngine(long catalogId, String
engine) {
public void invalidateDb(long catalogId, String dbName) {
routeCatalogEngines(catalogId, cache -> safeInvalidate(
cache, catalogId, "invalidateDb", () ->
cache.invalidateDb(catalogId, dbName)));
+ CatalogIf<?> catalog = getCatalog(catalogId);
+ if (catalog != null) {
Review Comment:
Same ID-mismatch concern: `invalidateDb` uses
`Util.genIdByName(catalog.getName(), dbName)` to generate the dbId for row
count cache invalidation, but the cache is populated with `ExternalTable.dbId`
which is set to `db.getId()` — the ExternalDatabase's actual metadata ID, not a
name-hashed value. This means `invalidateDb` will also miss the row count cache
entries.
##########
fe/fe-core/src/main/java/org/apache/doris/datasource/ExternalMetaCacheMgr.java:
##########
@@ -242,12 +245,34 @@ public void removeCatalogByEngine(long catalogId, String
engine) {
public void invalidateDb(long catalogId, String dbName) {
routeCatalogEngines(catalogId, cache -> safeInvalidate(
cache, catalogId, "invalidateDb", () ->
cache.invalidateDb(catalogId, dbName)));
+ CatalogIf<?> catalog = getCatalog(catalogId);
+ if (catalog != null) {
+ rowCountCache.invalidateDb(catalogId,
Util.genIdByName(catalog.getName(), dbName));
+ }
}
public void invalidateTable(long catalogId, String dbName, String
tableName) {
routeCatalogEngines(catalogId, cache -> safeInvalidate(
cache, catalogId, "invalidateTable",
() -> cache.invalidateTable(catalogId, dbName, tableName)));
+ CatalogIf<?> catalog = getCatalog(catalogId);
+ if (catalog != null) {
+ rowCountCache.invalidateTable(catalogId,
+ Util.genIdByName(catalog.getName(), dbName),
+ Util.genIdByName(catalog.getName(), dbName, tableName));
+ }
+ }
+
+ public void invalidateTable(ExternalTable dorisTable) {
+ invalidateTable(dorisTable.getCatalog().getId(),
dorisTable.getDbName(), dorisTable.getName());
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("invalid table cache for {}.{} in catalog {}",
dorisTable.getRemoteDbName(),
+ dorisTable.getRemoteName(),
dorisTable.getCatalog().getName());
+ }
+ }
+
+ public void invalidateTableRowCountCache(ExternalTable dorisTable) {
Review Comment:
**Bug: Row count cache invalidation uses name-generated IDs that do not
match the actual metadata IDs used to populate the cache.**
`invalidateTable(ExternalTable dorisTable)` delegates to
`invalidateTable(catalogId, dbName, tableName)` (line 247), which generates row
count cache keys via `Util.genIdByName(catalog.getName(), dbName)`. But
`ExternalTable.getRowCount()` (ExternalTable.java line 250) populates the cache
with `getCachedRowCount(catalog.getId(), dbId, id, ...)`, using the actual
metadata IDs `dbId` (set to `db.getId()` at line 149) and `id` (the table's
constructor-assigned ID).
`Util.genIdByName` computes `Math.abs(sha256(name1.name2...))` — a hash —
while `dbId` and `id` come from the metadata source (e.g., Hive Metastore IDs).
These are completely different numbers, so the invalidation will never match
and the row count cache entry will remain stale.
Compare this with `invalidateTableRowCountCache(ExternalTable)` which
correctly uses `dorisTable.getDbId()` and `dorisTable.getId()` directly.
**Suggested fix**: Change `invalidateTable(ExternalTable dorisTable)` to
invalidate the row count cache directly using the actual IDs:
```java
rowCountCache.invalidateTable(dorisTable.getCatalog().getId(),
dorisTable.getDbId(), dorisTable.getId());
```
rather than going through the name-based `invalidateTable(catalogId, dbName,
tableName)` for the row count cache portion.
##########
fe/fe-core/src/main/java/org/apache/doris/datasource/ExternalMetaCacheMgr.java:
##########
@@ -415,6 +430,11 @@ void replaceEngineCachesForTest(List<? extends
ExternalMetaCache> caches) {
cacheRegistry.resetForTest(caches);
}
+ // Only used by tests to replace the row count cache with a mock or spy.
+ void setRowCountCache(ExternalRowCountCache rowCountCache) {
+ this.rowCountCache = rowCountCache;
+ }
+
Review Comment:
`setRowCountCache` has default (package-private) visibility, intended for
unit tests only. Consider annotating with `@VisibleForTesting` to document the
intent and prevent accidental production use. (Same pattern used by
`replaceEngineCachesForTest` above at line 433 which is also package-private
for testing.)
##########
fe/fe-core/src/main/java/org/apache/doris/datasource/ExternalRowCountCache.java:
##########
@@ -162,4 +162,16 @@ public long getCachedRowCountIfPresent(long catalogId,
long dbId, long tableId)
return -1;
}
+ void invalidateCatalog(long catalogId) {
+ rowCountCache.asMap().keySet().removeIf(key -> key.catalogId ==
catalogId);
+ }
Review Comment:
`invalidateCatalog` and `invalidateDb` use `asMap().keySet().removeIf()`
which iterates over ALL cached entries to find matching keys. For catalogs with
many tables (e.g., thousands), this scans every entry. While catalog/db-level
invalidation is infrequent, this could become a latency concern. Caffeine
supports `invalidateAll(Iterable)` but there is no built-in range-invalidation
by key prefix. Consider at least adding a comment noting the O(N) scan behavior
for large caches.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]