coderzc opened a new pull request, #26012: URL: https://github.com/apache/pulsar/pull/26012
Fixes #25996 ### Motivation With `isDelayedDeliveryDeliverAtTimeStrict=true`, delayed messages can remain undelivered indefinitely past their `deliverAt` time while a consumer is blocked in `receive()`. The stalled messages are only released when an unrelated dispatch event happens (e.g. a new publish or a consumer reconnect); on a quiet topic the delay is unbounded. With the default `strict=false` the same traffic is delivered on time. Root cause is in `AbstractDelayedDeliveryTracker.updateTimer()`. Because delivery timestamps are trimmed for memory efficiency (up to ~511ms with the default `tickTimeMillis=1000`), `getScheduledMessages()` can pop a message slightly before its real `deliverAt`. In strict mode the dispatcher re-adds the not-yet-due message, which calls `updateTimer()`: 1. The existing timer (armed for the next message) is cancelled. 2. `delayMillis` for the re-added message is negative, so the method takes its early return — but it leaves `currentTimeoutTarget` pointing at the previous target and `timeout` non-null (now cancelled). 3. When the early message is finally delivered, the next `updateTimer()` sees `timestamp == currentTimeoutTarget`, concludes the timer is already correctly armed, and returns. No live timer exists, so the remaining delayed messages are never delivered until an external dispatch round happens to find them. `strict=false` is immune because its cutoff (`now + tickTimeMillis`) covers the trim window, so early-popped messages are delivered instead of being re-added. Thanks to @glumia for the detailed report and root-cause analysis. ### Modifications - In the `delayMillis < 0` early return of `AbstractDelayedDeliveryTracker.updateTimer()`, reset `currentTimeoutTarget = -1` and `timeout = null` so a later call cannot short-circuit on stale state and will correctly re-arm the timer. - Add a deterministic unit test `testStrictModeTimerStallsAfterEarlyPopAndReAdd` in `InMemoryDeliveryTrackerTest` that reproduces the early-pop / re-add sequence and asserts a delivery timer remains armed for the still-pending message. It fails without the fix and passes with it. ### Verifying this change - [x] Make sure that the change passes the CI checks. This change is already covered by the added unit test `testStrictModeTimerStallsAfterEarlyPopAndReAdd`. Existing `InMemoryDeliveryTrackerTest` and `BucketDelayedDeliveryTrackerTest` continue to pass. ### Documentation - [ ] `doc-required` - [x] `doc-not-needed` ### Matching PR in forked repository PR in forked repository: coderzc/pulsar (this change is a bug fix; the fix branch is `fix/delayed-delivery-strict-timer-stall-25996`). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
