lhotari opened a new issue, #25627:
URL: https://github.com/apache/pulsar/issues/25627

   ## Summary
   
   For Key_Shared subscriptions using the PIP-379 implementation 
(`PersistentStickyKeyDispatcherMultipleConsumers`), the dispatcher's 
`markDeletePositionMoveForward()` hook clears each consumer's `PendingAcksMap` 
for any positions at or below the new mark-delete position. This pending-acks 
cleanup is harmful for Key_Shared and can result in **out-of-order message 
delivery** for a sticky key whose hash is currently being drained.
   
   ## Background
   
   PIP-379 introduced the `DrainingHashesTracker` to prevent out-of-order 
delivery during sticky-key hash range reassignment. While consumer A still has 
unacked messages for hash *H* in its `PendingAcksMap`, the new owner of *H* 
(consumer B) is **blocked** from receiving messages with *H*. The hash refCount 
is decremented whenever an entry is removed from the `PendingAcksMap`, via 
`PendingAcksRemoveHandler.handleRemoving` → 
`DrainingHashesTracker.reduceRefCount`.
   
   The handler does not distinguish removal-because-acked from 
removal-because-mark-delete-advanced. Both paths look identical to the 
draining-hashes tracker.
   
   ## What happens
   
   `PersistentDispatcherMultipleConsumers.markDeletePositionMoveForward()` does 
two things when the cursor mark-delete position advances (triggered by 
skip-messages, clear-backlog, expiry, or backlog-quota eviction):
   
   1. Drops entries from `redeliveryMessages` for positions ≤ new mark-delete 
position.
   2. Calls `Consumer.removePendingAcksUpToPositionAndDecrementUnacked(...)` 
for every consumer in the dispatcher.
   
   `PersistentStickyKeyDispatcherMultipleConsumers` invokes 
`super.markDeletePositionMoveForward()` and inherits both behaviors.
   
   Step 2 is correct for Shared subscriptions, but for Key_Shared (PIP-379) it 
removes in-flight pending acks that the draining-hashes mechanism is using to 
keep a hash blocked. The hash refCount can prematurely drop to 0, the block 
lifts, and the new owner of that hash starts receiving messages with the same 
key **before the original consumer has processed its already-on-the-wire 
messages**.
   
   ### Failure scenario
   
   1. Consumer A receives message M with sticky key K (hash H). M is added to 
A's `PendingAcksMap`. M is on the wire.
   2. Consumer B joins; the sticky-key range covering H reassigns from A to B. 
Hash H is registered as **draining** for A.
   3. Mark-delete advances past M's position — for example, because 
backlog-quota eviction (`BacklogQuotaManager`) calls 
`ManagedCursor.skipEntries`, or because message expiry advances the cursor, or 
an admin issues `skip-messages`/`clear-backlog`.
   4. `markDeletePositionMoveForward` fires. M is removed from A's 
`PendingAcksMap`. The remove handler decrements H's refCount, which reaches 0, 
and the draining block lifts.
   5. A new message with key K is produced. It is delivered to consumer **B**.
   6. Meanwhile, M is still in flight to consumer A and is processed by A.
   7. The application has now processed two messages with the same key on two 
different consumers — out-of-order delivery, violating the Key_Shared ordering 
guarantee.
   
   ## Trigger surface
   
   Any cursor mark-delete advance that does not go through the consumer-driven 
ack path can hit this:
   
   - Backlog-quota eviction (size policy `consumer_backlog_eviction` and 
non-precise time policy) — see `BacklogQuotaManager`.
   - Message expiry — `PersistentMessageExpiryMonitor`.
   - Admin `skipMessages` / `clearBacklog` on a subscription.
   
   ## Pre-existing, not a regression
   
   This is a latent bug, not introduced by recent PRs:
   
   - The same `Consumer.removePendingAcksUpToPositionAndDecrementUnacked` call 
previously lived inside 
`PersistentDispatcherMultipleConsumers.readMoreEntries()`.
   - PR **#25592** ("Move pending acks cleanup to selected mark-delete 
callbacks") relocated it into the new `markDeletePositionMoveForward` hook.
   - PR **#25624** ("Close pending acks cleanup gap in BacklogQuotaManager") 
wired the hook into the `BacklogQuotaManager` paths that had been missing it.
   
   Neither PR semantically changed the pending-acks cleanup itself; both are 
correct for Shared subscriptions and faithfully preserve the prior behavior. 
The bug surfaces because the prior behavior is wrong for PIP-379 Key_Shared — 
and is now reachable from more paths.
   
   The original review of PR #24096 retained step 2 over concern that draining 
hashes could otherwise be blocked indefinitely if pending acks weren't pruned 
on mark-delete advance. That concern needs revising: the pending acks will be 
cleared naturally when the consumer eventually acks (delivery still completes 
even if the broker has trimmed storage) or disconnects 
(`PendingAcksMap.forEachAndClose` with `closing=true`).
   
   ## Impact
   
   - **Severity:** Correctness regression for Key_Shared ordering — the central 
invariant of the subscription type.
   - **Trigger conditions:** Requires a draining sticky-key hash (consumer set 
has changed) AND a mark-delete advance that bypasses the per-consumer ack path. 
Backlog-quota eviction (`consumer_backlog_eviction` policy), message expiry, 
and admin skip/clear operations can all reach it.
   - **Detectability:** Silent. No log, no metric. The application sees 
out-of-order processing.
   - **Visibility:** No tests today exercise the interaction between 
mark-delete advancement and `DrainingHashesTracker` for Key_Shared.
   
   ## Scope of fix
   
   Should target the PIP-379 dispatcher 
(`PersistentStickyKeyDispatcherMultipleConsumers`) and its parent 
(`PersistentDispatcherMultipleConsumers`).
   
   The deprecated `PersistentStickyKeyDispatcherMultipleConsumersClassic` is 
out of scope.
   
   The redelivery-tracker cleanup (step 1 above) must be preserved for 
Key_Shared — it correctly prevents re-reading entries that storage has already 
discarded.
   
   ## Related
   
   - PR #25592 — relocated the cleanup into `markDeletePositionMoveForward`.
   - PR #25624 — closed gaps in `BacklogQuotaManager`, increasing the bug's 
reachability.
   - PR #24096 — earlier review thread where the trade-off was first discussed.
   - PIP-379 — the design document for the Key_Shared draining-hashes mechanism.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to