tarmacmonsterg commented on issue #25097:
URL: https://github.com/apache/pulsar/issues/25097#issuecomment-4295884328

   Hello @lhotari .
   I with some news about the problem.
   We already updated till 4.0.9 and in some scenarious, we had this problem.
   
   The main thing I’ve noticed is that the issue starts immediately after the 
error:
   
   `Too many requests to the same Bookie while reading Lxxxxxxx Exxxxxxx from 
bookie: xxxxxxx`
   
   After that, replication for the topic stops.
   We observe this issue on both partitioned and non-partitioned topics.
   
   I enabled debug logging, but there’s nothing particularly unusual.
   The only relevant message I found is:
   ```
   DEBUG org.apache.pulsar.broker.service.AbstractReplicator - 
   [persistent://taxistartup/driver_statistics/global | 
pulsar-yyy-->pulsar-xxx] 
   Replicator was already running. state: Started
   ```
   In stats:
   ```
   "replication" : {
       "pulsar-hel" : {
         "msgRateIn" : 0.0,
         "msgInCount" : 0,
         "msgThroughputIn" : 0.0,
         "bytesInCount" : 0,
         "msgRateOut" : 0.0,
         "msgOutCount" : 0,
         "msgThroughputOut" : 0.0,
         "bytesOutCount" : 0,
         "msgRateExpired" : 0.0,
         "replicationBacklog" : 10709149,
         "connected" : true,
         "replicationDelayInSeconds" : 0,
         "inboundConnection" : "/1.1.1.1:47522",
         "inboundConnectedSince" : "2026-04-22T11:15:56.371025022Z",
         "outboundConnection" : "[id: 0x53a11734, L:/1.1.1.2:38490 - 
R:host.com/1.1.1.3:11111]",
         "outboundConnectedSince" : "2026-04-22T11:15:56.593910517Z",
         "msgExpiredCount" : 0
       }
   ```
   Stats-internal:
   ```
       "pulsar.repl.pulsar-xxx" : {
         "markDeletePosition" : "5007805:205242",
         "readPosition" : "5007805:205243",
         "waitingReadOp" : false,
         "pendingReadOps" : 0,
         "messagesConsumedCounter" : -10672664,
         "cursorLedger" : -1,
         "cursorLedgerLastEntry" : -1,
         "individuallyDeletedMessages" : "[]",
         "lastLedgerSwitchTimestamp" : "2026-04-22T11:15:56.351Z",
         "state" : "NoLedger",
         "active" : true,
         "numberOfEntriesSinceFirstNotAckedMessage" : 1,
         "totalNonContiguousDeletedMessagesRange" : 0,
         "subscriptionHavePendingRead" : false,
         "subscriptionHavePendingReplayRead" : false,
         "properties" : { }
       },
   ```
   
   When I perform an unload, the cursorLedger and cursorLedgerLastEntry fields 
in stats-internal are reset. After that, still  stucked.
   
   I tried advancing the cursor by skipping messages in the subscription — the 
cursor appears, but replication still does not start.
   
   On the other hand, there are no errors either.
   
   A restart does not help. Only disabling and re-enabling replication resolves 
the issue.
   
   The issue reproduces consistently every few days, depending on the load.
   
   Where should I look to gather more information to troubleshoot this problem?
   
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to