github-actions[bot] commented on code in PR #63310:
URL: https://github.com/apache/doris/pull/63310#discussion_r3425103907
##########
fe/fe-core/src/main/java/org/apache/doris/catalog/CatalogRecycleBin.java:
##########
@@ -67,6 +69,14 @@ public class CatalogRecycleBin extends MasterDaemon
implements Writable {
private final ReentrantReadWriteLock lock = new
ReentrantReadWriteLock(true);
+ // Injectable clock source, defaults to system clock
+ private LongSupplier clock = System::currentTimeMillis;
Review Comment:
`CatalogRecycleBin.write()` still serializes `this` with
`GsonUtils.GSON.toJson(this)` and `readFieldsWithGson()` immediately calls
`GsonUtils.GSON.fromJson(..., CatalogRecycleBin.class)` on that JSON. Because
this new test hook is not `transient` or otherwise excluded, the image JSON now
contains a `LongSupplier` implementation. On image load Gson has to deserialize
that interface field even though the returned object is ignored, which can fail
and at minimum persists a non-deterministic test-only clock implementation.
Please keep the injected clock out of the persisted image.
```suggestion
private transient LongSupplier clock = System::currentTimeMillis;
```
##########
fe/fe-core/src/main/java/org/apache/doris/catalog/CatalogRecycleBin.java:
##########
@@ -1393,13 +1403,14 @@ public void addTabletToInvertedIndex() {
@Override
protected void runAfterCatalogReady() {
- long currentTimeMs = System.currentTimeMillis();
// should follow the partition/table/db order
// in case of partition(table) is still in recycle bin but table(db)
is missing
+ // Each erase method gets its own currentTimeMs to avoid using a stale
timestamp,
+ // since previous erase operations may take significant time.
int keepNum = Config.max_same_name_catalog_trash_num;
- erasePartition(currentTimeMs, keepNum);
- eraseTable(currentTimeMs, keepNum);
- eraseDatabase(currentTimeMs, keepNum);
+ erasePartition(clock.getAsLong(), keepNum);
+ eraseTable(clock.getAsLong(), keepNum);
+ eraseDatabase(clock.getAsLong(), keepNum);
Review Comment:
This fresh timestamp can remove a parent after its child phase already
skipped the child, which violates the invariant in the comment above. Concrete
drop-database case: tables are recycled before the db in
`InternalCatalog.unprotectDropDb()`/`recycleDatabase()`. If `eraseTable()` is
called at `tableRecycleTime + expire - 1` it leaves the table in `idToTable`;
if that phase then takes long enough, this later
`eraseDatabase(clock.getAsLong(), ...)` can see the db as expired and the
normal expired branch only removes `idToDatabase`/logs `OP_ERASE_DB` (it does
not call `eraseAllTables`). After replay or image rebuild the table remains in
the recycle bin with its database already erased, exactly what the
partition/table/db ordering is supposed to prevent. The same boundary exists
between partition and table. Please either use a consistent cutoff for
dependent child/parent phases, or make parent erasure check/erase remaining
children before logging the parent erase.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]