ng-galien opened a new pull request, #25548:
URL: https://github.com/apache/pulsar/pull/25548
Fixes #25540
### Motivation
When the number of individually deleted message ranges in a cursor exceeds
`managedLedgerMaxUnackedRangesToPersist` (default 10 000), or the number of
batch deleted indexes exceeds `managedLedgerMaxBatchDeletedIndexToPersist`
(default 10 000), the excess ack state is silently truncated during cursor
persistence. On broker restart, those acks are lost and already-acknowledged
messages are redelivered.
Today there is no signal when this crossing happens. The only way to detect
it is to actively monitor `totalNonContiguousDeletedMessagesRange` via the
Admin API or Prometheus and compare it manually to the configured limit. Issue
#25540 and the maintainer's follow-up comments describe the concrete impact
(consumer saturation after restart, compounding redeliveries) and ask for both
a log and a broker-level metric.
### Modifications
**Feature (commit 1) — warn and emit metric on overflow:**
- `ManagedCursorImpl.buildIndividualDeletedMessageRanges()` and
`buildBatchEntryDeletionIndexInfoList()` emit a `WARN` log exactly once per
crossing (edge-detected via an `AtomicBoolean`), including tuning advice:
`managedLedgerMaxUnackedRangesToPersist`,
`managedLedgerMaxBatchDeletedIndexToPersist`,
`managedLedgerPersistIndividualAckAsLongArray`,
`managedCursorInfoCompressionType`.
- Two broker-level OpenTelemetry counters are incremented on every truncated
persist (no topic-level attributes, so no cardinality growth):
- `pulsar.broker.managed_ledger.cursor.persist.overflow.range.count`
- `pulsar.broker.managed_ledger.cursor.persist.overflow.batch.index.count`
- `conf/broker.conf` documents both signals next to the corresponding
settings, so operators discover them while tuning the limits.
- The overflow signal is derived from iteration state already computed,
avoiding an `O(N)` `size()` call on the hot path. The exact range count is only
read on the first crossing (guarded by the edge detector), so precise log
context costs nothing on subsequent persists.
**Incidental fix (commit 2) — off-by-one in the ranges cap:**
While writing a reload-based regression test for the new warning,
`buildIndividualDeletedMessageRanges` turned out to persist `maxRanges + 1`
entries rather than `maxRanges`. The `forEach` callback used to append each
range and then return `rangeList.size() <= maxRanges`, keeping iteration alive
until one extra entry had been added. The regression dates back to #3819
(2019), when the original `stream().limit(N)` was translated to the `forEach`
callback without preserving the strict cap; #18357 later migrated to
`forEachRawRange` and carried the bug forward. The sibling
`buildBatchEntryDeletionIndexInfoList` has always used the correct
check-before-add pattern.
Beyond the obvious "config says 10 000, runtime persists 10 001" contract
violation, the off-by-one would cause the new WARN log and counter to fire
spuriously when `totalRanges == maxRanges + 1`: the list actually contained
every range (no truncation), but the old `rangeList.size() > maxRanges` check
reported "overflow". Shipping the new signal without this fix would undermine
the operator trust we are trying to build.
The fix restructures the callback into a check-before-add shape symmetric
with `buildBatchEntryDeletionIndexInfoList`, using a `MutableBoolean` as the
overflow flag. `MutableBoolean` is preferred over `AtomicBoolean` because the
method runs under the cursor write lock — no concurrent readers — and the
codebase already uses `MutableBoolean` in this file (e.g. `readEntries` around
line 2219) for the same purpose. The change preserves the
zero-`Range<T>`-allocation benefit introduced in #18357.
### Verifying this change
This change added tests and can be verified as follows:
- `ManagedCursorTest.testPersistOverflowRangesCounter` (parameterized over
overflow / no-overflow / exact-limit cases): asserts that the OTel counter
direction matches the scenario, and in the overflow case reopens the ledger
after close and verifies exactly `maxRanges` entries are recovered — this
second assertion is the regression guard for the off-by-one fix. It fails on
the unpatched cursor (`expected 5 but found 6`) and passes with the fix.
- `ManagedCursorTest.testPersistOverflowBatchIndexesCounter` (parameterized
similarly): asserts the batch-index counter direction under overflow,
no-overflow and exact-limit scenarios. No cap assertion needed here because the
batch variant always used the correct pattern.
- Counter assertions are bounded to a boolean direction (not exact counts)
because the number of persist cycles triggered during `ledger.close()` is an
internal cadence that we should not bind the test to.
### Does this pull request potentially affect one of the following parts:
- [x] The metrics — adds two broker-level OTel counters (documented in
`broker.conf` and described above).
Other items: not affected. No API/schema/protocol changes, no default config
changes, no threading model change (write-lock semantics preserved).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]