oneby-wang opened a new pull request, #25957:
URL: https://github.com/apache/pulsar/pull/25957
### Motivation
`AuditorLedgerCheckerTest.testDelayedAuditOfLostBookies` is flaky when
repeated with a high invocation count. The test configures
`lostBookieRecoveryDelay` to 5 seconds, shuts down a non-auditor bookie, and
then uses fixed waits that start immediately after the shutdown thread is
launched.
```
audit of lost bookie isn't delayed
java.lang.AssertionError: audit of lost bookie isn't delayed
at org.testng.AssertJUnit.fail(AssertJUnit.java:65)
at org.testng.AssertJUnit.assertTrue(AssertJUnit.java:23)
at
org.apache.bookkeeper.replication.AuditorLedgerCheckerTest.testInnerDelayedAuditOfLostBookies(AuditorLedgerCheckerTest.java:415)
at
org.apache.bookkeeper.replication.AuditorLedgerCheckerTest.testDelayedAuditOfLostBookies(AuditorLedgerCheckerTest.java:436)
at
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
at java.base/java.lang.reflect.Method.invoke(Method.java:565)
at
org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:141)
at
org.testng.internal.invokers.InvokeMethodRunnable.runOne(InvokeMethodRunnable.java:47)
at
org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:76)
at
org.testng.internal.invokers.InvokeMethodRunnable.call(InvokeMethodRunnable.java:11)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:328)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1090)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614)
at java.base/java.lang.Thread.run(Thread.java:1474)
```
The test was measuring the delay from the moment it started a thread to shut
down a bookie. However, `lostBookieRecoveryDelay` starts only after the auditor
observes the lost bookie and schedules the delayed audit task.
### Modifications
- Wait until the auditor has scheduled the delayed `auditTask` before
starting the delay assertions.
- Keep the negative assertion anchored to the configured delay window,
verifying that the ledger is not marked under-replicated before the delay
expires.
- Use a short grace period after the delay window for the scheduled audit to
run and for the under-replication watcher to observe the result.
- Add a helper to wait for the delayed audit task to be scheduled without
triggering an audit directly.
### Verifying this change
- [x] Make sure that the change passes the CI checks.
### Does this pull request potentially affect one of the following parts:
- [ ] Dependencies (add or upgrade a dependency)
- [ ] The public API
- [ ] The schema
- [ ] The default values of configurations
- [ ] The threading model
- [ ] The binary protocol
- [ ] The REST endpoints
- [ ] The admin CLI options
- [ ] The metrics
- [ ] Anything that affects deployment
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]