lhotari opened a new issue, #25627:
URL: https://github.com/apache/pulsar/issues/25627
## Summary
For Key_Shared subscriptions using the PIP-379 implementation
(`PersistentStickyKeyDispatcherMultipleConsumers`), the dispatcher's
`markDeletePositionMoveForward()` hook clears each consumer's `PendingAcksMap`
for any positions at or below the new mark-delete position. This pending-acks
cleanup is harmful for Key_Shared and can result in **out-of-order message
delivery** for a sticky key whose hash is currently being drained.
## Background
PIP-379 introduced the `DrainingHashesTracker` to prevent out-of-order
delivery during sticky-key hash range reassignment. While consumer A still has
unacked messages for hash *H* in its `PendingAcksMap`, the new owner of *H*
(consumer B) is **blocked** from receiving messages with *H*. The hash refCount
is decremented whenever an entry is removed from the `PendingAcksMap`, via
`PendingAcksRemoveHandler.handleRemoving` →
`DrainingHashesTracker.reduceRefCount`.
The handler does not distinguish removal-because-acked from
removal-because-mark-delete-advanced. Both paths look identical to the
draining-hashes tracker.
## What happens
`PersistentDispatcherMultipleConsumers.markDeletePositionMoveForward()` does
two things when the cursor mark-delete position advances (triggered by
skip-messages, clear-backlog, expiry, or backlog-quota eviction):
1. Drops entries from `redeliveryMessages` for positions ≤ new mark-delete
position.
2. Calls `Consumer.removePendingAcksUpToPositionAndDecrementUnacked(...)`
for every consumer in the dispatcher.
`PersistentStickyKeyDispatcherMultipleConsumers` invokes
`super.markDeletePositionMoveForward()` and inherits both behaviors.
Step 2 is correct for Shared subscriptions, but for Key_Shared (PIP-379) it
removes in-flight pending acks that the draining-hashes mechanism is using to
keep a hash blocked. The hash refCount can prematurely drop to 0, the block
lifts, and the new owner of that hash starts receiving messages with the same
key **before the original consumer has processed its already-on-the-wire
messages**.
### Failure scenario
1. Consumer A receives message M with sticky key K (hash H). M is added to
A's `PendingAcksMap`. M is on the wire.
2. Consumer B joins; the sticky-key range covering H reassigns from A to B.
Hash H is registered as **draining** for A.
3. Mark-delete advances past M's position — for example, because
backlog-quota eviction (`BacklogQuotaManager`) calls
`ManagedCursor.skipEntries`, or because message expiry advances the cursor, or
an admin issues `skip-messages`/`clear-backlog`.
4. `markDeletePositionMoveForward` fires. M is removed from A's
`PendingAcksMap`. The remove handler decrements H's refCount, which reaches 0,
and the draining block lifts.
5. A new message with key K is produced. It is delivered to consumer **B**.
6. Meanwhile, M is still in flight to consumer A and is processed by A.
7. The application has now processed two messages with the same key on two
different consumers — out-of-order delivery, violating the Key_Shared ordering
guarantee.
## Trigger surface
Any cursor mark-delete advance that does not go through the consumer-driven
ack path can hit this:
- Backlog-quota eviction (size policy `consumer_backlog_eviction` and
non-precise time policy) — see `BacklogQuotaManager`.
- Message expiry — `PersistentMessageExpiryMonitor`.
- Admin `skipMessages` / `clearBacklog` on a subscription.
## Pre-existing, not a regression
This is a latent bug, not introduced by recent PRs:
- The same `Consumer.removePendingAcksUpToPositionAndDecrementUnacked` call
previously lived inside
`PersistentDispatcherMultipleConsumers.readMoreEntries()`.
- PR **#25592** ("Move pending acks cleanup to selected mark-delete
callbacks") relocated it into the new `markDeletePositionMoveForward` hook.
- PR **#25624** ("Close pending acks cleanup gap in BacklogQuotaManager")
wired the hook into the `BacklogQuotaManager` paths that had been missing it.
Neither PR semantically changed the pending-acks cleanup itself; both are
correct for Shared subscriptions and faithfully preserve the prior behavior.
The bug surfaces because the prior behavior is wrong for PIP-379 Key_Shared —
and is now reachable from more paths.
The original review of PR #24096 retained step 2 over concern that draining
hashes could otherwise be blocked indefinitely if pending acks weren't pruned
on mark-delete advance. That concern needs revising: the pending acks will be
cleared naturally when the consumer eventually acks (delivery still completes
even if the broker has trimmed storage) or disconnects
(`PendingAcksMap.forEachAndClose` with `closing=true`).
## Impact
- **Severity:** Correctness regression for Key_Shared ordering — the central
invariant of the subscription type.
- **Trigger conditions:** Requires a draining sticky-key hash (consumer set
has changed) AND a mark-delete advance that bypasses the per-consumer ack path.
Backlog-quota eviction (`consumer_backlog_eviction` policy), message expiry,
and admin skip/clear operations can all reach it.
- **Detectability:** Silent. No log, no metric. The application sees
out-of-order processing.
- **Visibility:** No tests today exercise the interaction between
mark-delete advancement and `DrainingHashesTracker` for Key_Shared.
## Scope of fix
Should target the PIP-379 dispatcher
(`PersistentStickyKeyDispatcherMultipleConsumers`) and its parent
(`PersistentDispatcherMultipleConsumers`).
The deprecated `PersistentStickyKeyDispatcherMultipleConsumersClassic` is
out of scope.
The redelivery-tracker cleanup (step 1 above) must be preserved for
Key_Shared — it correctly prevents re-reading entries that storage has already
discarded.
## Related
- PR #25592 — relocated the cleanup into `markDeletePositionMoveForward`.
- PR #25624 — closed gaps in `BacklogQuotaManager`, increasing the bug's
reachability.
- PR #24096 — earlier review thread where the trade-off was first discussed.
- PIP-379 — the design document for the Key_Shared draining-hashes mechanism.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]