tarmacmonsterg commented on issue #25097:
URL: https://github.com/apache/pulsar/issues/25097#issuecomment-4295884328
Hello @lhotari .
I with some news about the problem.
We already updated till 4.0.9 and in some scenarious, we had this problem.
The main thing I’ve noticed is that the issue starts immediately after the
error:
`Too many requests to the same Bookie while reading Lxxxxxxx Exxxxxxx from
bookie: xxxxxxx`
After that, replication for the topic stops.
We observe this issue on both partitioned and non-partitioned topics.
I enabled debug logging, but there’s nothing particularly unusual.
The only relevant message I found is:
```
DEBUG org.apache.pulsar.broker.service.AbstractReplicator -
[persistent://taxistartup/driver_statistics/global |
pulsar-yyy-->pulsar-xxx]
Replicator was already running. state: Started
```
In stats:
```
"replication" : {
"pulsar-hel" : {
"msgRateIn" : 0.0,
"msgInCount" : 0,
"msgThroughputIn" : 0.0,
"bytesInCount" : 0,
"msgRateOut" : 0.0,
"msgOutCount" : 0,
"msgThroughputOut" : 0.0,
"bytesOutCount" : 0,
"msgRateExpired" : 0.0,
"replicationBacklog" : 10709149,
"connected" : true,
"replicationDelayInSeconds" : 0,
"inboundConnection" : "/1.1.1.1:47522",
"inboundConnectedSince" : "2026-04-22T11:15:56.371025022Z",
"outboundConnection" : "[id: 0x53a11734, L:/1.1.1.2:38490 -
R:host.com/1.1.1.3:11111]",
"outboundConnectedSince" : "2026-04-22T11:15:56.593910517Z",
"msgExpiredCount" : 0
}
```
Stats-internal:
```
"pulsar.repl.pulsar-xxx" : {
"markDeletePosition" : "5007805:205242",
"readPosition" : "5007805:205243",
"waitingReadOp" : false,
"pendingReadOps" : 0,
"messagesConsumedCounter" : -10672664,
"cursorLedger" : -1,
"cursorLedgerLastEntry" : -1,
"individuallyDeletedMessages" : "[]",
"lastLedgerSwitchTimestamp" : "2026-04-22T11:15:56.351Z",
"state" : "NoLedger",
"active" : true,
"numberOfEntriesSinceFirstNotAckedMessage" : 1,
"totalNonContiguousDeletedMessagesRange" : 0,
"subscriptionHavePendingRead" : false,
"subscriptionHavePendingReplayRead" : false,
"properties" : { }
},
```
When I perform an unload, the cursorLedger and cursorLedgerLastEntry fields
in stats-internal are reset. After that, still stucked.
I tried advancing the cursor by skipping messages in the subscription — the
cursor appears, but replication still does not start.
On the other hand, there are no errors either.
A restart does not help. Only disabling and re-enabling replication resolves
the issue.
The issue reproduces consistently every few days, depending on the load.
Where should I look to gather more information to troubleshoot this problem?
Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]