ng-galien opened a new issue, #25540:
URL: https://github.com/apache/pulsar/issues/25540
## Summary
When the number of individually deleted message ranges (ack holes) in a
subscription cursor exceeds `managedLedgerMaxUnackedRangesToPersist` (default
10,000), the excess ack ranges are silently dropped during cursor persistence.
No log message is emitted at any level.
On broker restart, the cursor is recovered from BookKeeper with only the
persisted ranges. Acknowledged messages whose ack information was truncated are
redelivered as if never acknowledged.
The `broker.conf` comment describes this behavior:
> "After the max number of ranges is reached, the information will only be
tracked in memory and messages will be redelivered in case of crashes."
However, the operator has no warning that this threshold has been crossed.
The only way to detect it today is to actively monitor
`totalNonContiguousDeletedMessagesRange` via the Admin API or Prometheus and
compare it manually to the configured limit.
## Impact
When the threshold is silently exceeded and a broker restart occurs,
already-acknowledged messages are replayed in bulk. Depending on the
application, this can:
- saturate consumers with a sudden spike of messages that were already
processed
- cause unexpected side effects if message processing is not idempotent
- make the situation progressively worse, since replayed messages generate
new ack holes after being re-processed
Without a log at the threshold crossing, operators have no way to anticipate
this before a restart triggers it.
## Proposal
Add a rate-limited `log.warn` when the threshold is crossed. For example:
```
WARN - Cursor {cursor} on ledger {ledger}: individually deleted message
ranges ({actual})
exceeds managedLedgerMaxUnackedRangesToPersist ({limit}).
Ack state beyond this limit is not persisted and will be lost on
broker restart.
```
Suggested locations in `ManagedCursorImpl.java`:
- `buildIndividualDeletedMessageRanges()` — where the truncation happens
- Or on state transition of `isCursorDataFullyPersistable()` from true to
false
The log should be rate-limited to avoid spam when the threshold is
permanently exceeded.
Happy to submit a PR if a maintainer agrees this is worth adding.
## Related
- PIP-299: `dispatcherPauseOnAckStatePersistentEnabled` — pauses dispatching
at the limit, but also operates silently (no log at the transition)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]