amjadali-klarity opened a new issue, #25436: URL: https://github.com/apache/pulsar/issues/25436
### Search before reporting - [x] I searched in the [issues](https://github.com/apache/pulsar/issues) and found nothing similar. ### Read release policy - [x] I understand that [unsupported versions](https://pulsar.apache.org/contribute/release-policy/#supported-versions) don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker. ### User environment Pulsar Version: apachepulsar/pulsar-all:4.1.2 (Docker) Client Version: Pulsar-CPP-v3.5.1 Deployment: Kubernetes (provide cloud provider if applicable, e.g., EKS/GKE) Subscription Type: Shared ### Issue Description A worker service stopped receiving messages from a Shared subscription despite having available permits (999) and a non-zero backlog. The worker remains connected and responds to PING/PONG heartbeats, but the broker stops dispatching messages. The lastConsumedTime and lastAckedTime in topic stats remain frozen at the time of the incident. The issue is only resolved by restarting the Pulsar broker. ### Error messages ```text { "@timestamp": "2026-03-12 02:29:13.496", "@message": { "time": "2026-03-12T02:29:13.496153032Z", "stream": "stdout", "_p": "F", "log": "\tat java.base/java.util.concurrent.CompletableFuture.complete(Unknown Source) [?:?]", "kubernetes": { "pod_name": "lication.var.log.containers.apachepulsar-broker-0", "namespace_name": "pulsar", "container_name": "apachepulsar-broker", "docker_id": "aa59e0205fd6c4da6f789ae3e2f6bbed277afd108a7d21d68e1241e096928fee" } }, "log": "\tat java.base/java.util.concurrent.CompletableFuture.complete(Unknown Source) [?:?]" }, { "@timestamp": "2026-03-12 02:29:13.496", "@message": { "time": "2026-03-12T02:29:13.496161617Z", "stream": "stdout", "_p": "F", "log": "\tat org.apache.bookkeeper.meta.AbstractZkLedgerManager$4.processResult(AbstractZkLedgerManager.java:500) [org.apache.bookkeeper-bookkeeper-server-4.17.2.jar:4.17.2]", "kubernetes": { "pod_name": "lication.var.log.containers.apachepulsar-broker-0", "namespace_name": "pulsar", "container_name": "apachepulsar-broker", "docker_id": "aa59e0205fd6c4da6f789ae3e2f6bbed277afd108a7d21d68e1241e096928fee" } }, "log": "\tat org.apache.bookkeeper.meta.AbstractZkLedgerManager$4.processResult(AbstractZkLedgerManager.java:500) [org.apache.bookkeeper-bookkeeper-server-4.17.2.jar:4.17.2]" }, { "@timestamp": "2026-03-12 02:29:13.496", "@message": { "time": "2026-03-12T02:29:13.496169578Z", "stream": "stdout", "_p": "F", "log": "\tat org.apache.bookkeeper.zookeeper.ZooKeeperClient$19$1.processResult(ZooKeeperClient.java:997) [org.apache.bookkeeper-bookkeeper-server-4.17.2.jar:4.17.2]", "kubernetes": { "pod_name": "lication.var.log.containers.apachepulsar-broker-0", "namespace_name": "pulsar", "container_name": "apachepulsar-broker", "docker_id": "aa59e0205fd6c4da6f789ae3e2f6bbed277afd108a7d21d68e1241e096928fee" } }, "log": "\tat org.apache.bookkeeper.zookeeper.ZooKeeperClient$19$1.processResult(ZooKeeperClient.java:997) [org.apache.bookkeeper-bookkeeper-server-4.17.2.jar:4.17.2]" }, { "@timestamp": "2026-03-12 02:29:13.496", "@message": { "time": "2026-03-12T02:29:13.496177616Z", "stream": "stdout", "_p": "F", "log": "\tat org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:622) [org.apache.zookeeper-zookeeper-3.9.4.jar:3.9.4]", "kubernetes": { "pod_name": "lication.var.log.containers.apachepulsar-broker-0", "namespace_name": "pulsar", "container_name": "apachepulsar-broker", "docker_id": "aa59e0205fd6c4da6f789ae3e2f6bbed277afd108a7d21d68e1241e096928fee" } }, "log": "\tat org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:622) [org.apache.zookeeper-zookeeper-3.9.4.jar:3.9.4]" }, { "@timestamp": "2026-03-12 02:29:13.496", "@message": { "time": "2026-03-12T02:29:13.496196153Z", "stream": "stdout", "_p": "F", "log": "\tat org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:541) [org.apache.zookeeper-zookeeper-3.9.4.jar:3.9.4]", "kubernetes": { "pod_name": "lication.var.log.containers.apachepulsar-broker-0", "namespace_name": "pulsar", "container_name": "apachepulsar-broker", "docker_id": "aa59e0205fd6c4da6f789ae3e2f6bbed277afd108a7d21d68e1241e096928fee" } }, "log": "\tat org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:541) [org.apache.zookeeper-zookeeper-3.9.4.jar:3.9.4]" }, { "@timestamp": "2026-03-12 02:29:13.496", "@message": { "time": "2026-03-12T02:29:13.496204238Z", "stream": "stdout", "_p": "F", "log": "2026-03-12T02:29:13,495+0000 [ForkJoinPool.commonPool-worker-1-EventThread] WARN org.apache.bookkeeper.mledger.impl.OpAddEntry - Error when closing ledger 5905. Status=Bad ledger metadata version", "kubernetes": { "pod_name": "lication.var.log.containers.apachepulsar-broker-0", "namespace_name": "pulsar", "container_name": "apachepulsar-broker", "docker_id": "aa59e0205fd6c4da6f789ae3e2f6bbed277afd108a7d21d68e1241e096928fee" } }, "log": "2026-03-12T02:29:13,495+0000 [ForkJoinPool.commonPool-worker-1-EventThread] WARN org.apache.bookkeeper.mledger.impl.OpAddEntry - Error when closing ledger 5905. Status=Bad ledger metadata version" }, { "@timestamp": "2026-03-12 02:29:13.496", "@message": { "time": "2026-03-12T02:29:13.496212491Z", "stream": "stdout", "_p": "F", "log": "2026-03-12T02:29:13,495+0000 [ForkJoinPool.commonPool-worker-1-EventThread] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [rgain/ice-ai/persistent/complete] Creating a new ledger after closed 5905", "kubernetes": { "pod_name": "lication.var.log.containers.apachepulsar-broker-0", "namespace_name": "pulsar", "container_name": "apachepulsar-broker", "docker_id": "aa59e0205fd6c4da6f789ae3e2f6bbed277afd108a7d21d68e1241e096928fee" } }, "log": "2026-03-12T02:29:13,495+0000 [ForkJoinPool.commonPool-worker-1-EventThread] INFO org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [rgain/ice-ai/persistent/complete] Creating a new ledger after closed 5905" }, { "@timestamp": "2026-03-12 02:29:13.496", "@message": { "time": "2026-03-12T02:29:13.496245347Z", "stream": "stdout", "_p": "F", "log": "2026-03-12T02:29:13,495+0000 [BookKeeperClientWorker-OrderedExecutor-1-0] INFO org.apache.bookkeeper.client.BookieWatcherImpl - New ensemble: [apachepulsar-bookie-1.apachepulsar-bookie.pulsar.svc.cluster.local:3181, apachepulsar-bookie-0.apachepulsar-bookie.pulsar.svc.cluster.local:3181] is not adhering to Placement Policy. quarantinedBookies: []", "kubernetes": { "pod_name": "lication.var.log.containers.apachepulsar-broker-0", "namespace_name": "pulsar", "container_name": "apachepulsar-broker", "docker_id": "aa59e0205fd6c4da6f789ae3e2f6bbed277afd108a7d21d68e1241e096928fee" } }, "log": "2026-03-12T02:29:13,495+0000 [BookKeeperClientWorker-OrderedExecutor-1-0] INFO org.apache.bookkeeper.client.BookieWatcherImpl - New ensemble: [apachepulsar-bookie-1.apachepulsar-bookie.pulsar.svc.cluster.local:3181, apachepulsar-bookie-0.apachepulsar-bookie.pulsar.svc.cluster.local:3181] is not adhering to Placement Policy. quarantinedBookies: []" }, { "@timestamp": "2026-03-12 02:29:13.495", "@message": { "time": "2026-03-12T02:29:13.4959005Z", "stream": "stdout", "_p": "F", "log": "2026-03-12T02:29:13,494+0000 [ForkJoinPool.commonPool-worker-1-EventThread] ERROR org.apache.bookkeeper.client.LedgerHandle - Metadata conflict when closing ledger 5905. Another client may have recovered the ledger while there were writes outstanding. (local lastEntry:0 length:6083) (metadata lastEntry:-1 length:0)", "kubernetes": { "pod_name": "lication.var.log.containers.apachepulsar-broker-0", "namespace_name": "pulsar", "container_name": "apachepulsar-broker", "docker_id": "aa59e0205fd6c4da6f789ae3e2f6bbed277afd108a7d21d68e1241e096928fee" } }, "log": "2026-03-12T02:29:13,494+0000 [ForkJoinPool.commonPool-worker-1-EventThread] ERROR org.apache.bookkeeper.client.LedgerHandle - Metadata conflict when closing ledger 5905. Another client may have recovered the ledger while there were writes outstanding. (local lastEntry:0 length:6083) (metadata lastEntry:-1 length:0)" }, { "@timestamp": "2026-03-12 02:29:13.495", "@message": { "time": "2026-03-12T02:29:13.495973346Z", "stream": "stdout", "_p": "F", "log": "2026-03-12T02:29:13,494+0000 [ForkJoinPool.commonPool-worker-1-EventThread] ERROR org.apache.bookkeeper.client.MetadataUpdateLoop - UpdateLoop(ledgerId=5905,loopId=143e2cbf) Exception updating", "kubernetes": { "pod_name": "lication.var.log.containers.apachepulsar-broker-0", "namespace_name": "pulsar", "container_name": "apachepulsar-broker", "docker_id": "aa59e0205fd6c4da6f789ae3e2f6bbed277afd108a7d21d68e1241e096928fee" } }, "log": "2026-03-12T02:29:13,494+0000 [ForkJoinPool.commonPool-worker-1-EventThread] ERROR org.apache.bookkeeper.client.MetadataUpdateLoop - UpdateLoop(ledgerId=5905,loopId=143e2cbf) Exception updating" }, { "@timestamp": "2026-03-12 02:29:13.495", "@message": { "time": "2026-03-12T02:29:13.49598647Z", "stream": "stdout", "_p": "F", "log": "org.apache.bookkeeper.client.BKException$BKMetadataVersionException: Bad ledger metadata version", "kubernetes": { "pod_name": "lication.var.log.containers.apachepulsar-broker-0", "namespace_name": "pulsar", "container_name": "apachepulsar-broker", "docker_id": "aa59e0205fd6c4da6f789ae3e2f6bbed277afd108a7d21d68e1241e096928fee" } }, "log": "org.apache.bookkeeper.client.BKException$BKMetadataVersionException: Bad ledger metadata version" }, { "@timestamp": "2026-03-12 02:29:13.495", "@message": { "time": "2026-03-12T02:29:13.495996519Z", "stream": "stdout", "_p": "F", "log": "\tat org.apache.bookkeeper.client.LedgerHandle.lambda$null$2(LedgerHandle.java:602) ~[org.apache.bookkeeper-bookkeeper-server-4.17.2.jar:4.17.2]", "kubernetes": { "pod_name": "lication.var.log.containers.apachepulsar-broker-0", "namespace_name": "pulsar", "container_name": "apachepulsar-broker", "docker_id": "aa59e0205fd6c4da6f789ae3e2f6bbed277afd108a7d21d68e1241e096928fee" } }, "log": "\tat org.apache.bookkeeper.client.LedgerHandle.lambda$null$2(LedgerHandle.java:602) ~[org.apache.bookkeeper-bookkeeper-server-4.17.2.jar:4.17.2]" }, { "@timestamp": "2026-03-12 02:29:13.493", "@message": { "time": "2026-03-12T02:29:13.493502053Z", "stream": "stdout", "_p": "F", "log": "2026-03-12T02:29:13,493+0000 [ForkJoinPool.commonPool-worker-1-EventThread] INFO org.apache.bookkeeper.client.MetadataUpdateLoop - UpdateLoop(ledgerId=5905,loopId=143e2cbf) conflict writing metadata to store, update local value and try again", "kubernetes": { "pod_name": "lication.var.log.containers.apachepulsar-broker-0", "namespace_name": "pulsar", "container_name": "apachepulsar-broker", "docker_id": "aa59e0205fd6c4da6f789ae3e2f6bbed277afd108a7d21d68e1241e096928fee" } }, "log": "2026-03-12T02:29:13,493+0000 [ForkJoinPool.commonPool-worker-1-EventThread] INFO org.apache.bookkeeper.client.MetadataUpdateLoop - UpdateLoop(ledgerId=5905,loopId=143e2cbf) conflict writing metadata to store, update local value and try again" }, { "@timestamp": "2026-03-12 02:29:13.489", "@message": { "time": "2026-03-12T02:29:13.489093416Z", "stream": "stdout", "_p": "F", "log": "2026-03-12T02:29:13,488+0000 [BookKeeperClientWorker-OrderedExecutor-1-0] INFO org.apache.bookkeeper.mledger.impl.OpAddEntry - [rgain/ice-ai/persistent/complete] Closing ledger 5905 for being full", "kubernetes": { "pod_name": "lication.var.log.containers.apachepulsar-broker-0", "namespace_name": "pulsar", "container_name": "apachepulsar-broker", "docker_id": "aa59e0205fd6c4da6f789ae3e2f6bbed277afd108a7d21d68e1241e096928fee" } }, "log": "2026-03-12T02:29:13,488+0000 [BookKeeperClientWorker-OrderedExecutor-1-0] INFO org.apache.bookkeeper.mledger.impl.OpAddEntry - [rgain/ice-ai/persistent/complete] Closing ledger 5905 for being full" }, { "@timestamp": "2026-03-12 02:29:13.479", "@message": { "time": "2026-03-12T02:29:13.479073308Z", "stream": "stdout", "_p": "F", "log": "2026-03-12T02:29:13,476+0000 [broker-topic-workers-OrderedExecutor-2-0] INFO org.apache.pulsar.broker.service.ServerCnx - [/100.72.3.187:40938] Created new producer: Producer{topic=PersistentTopic{topic=persistent://rgain/ice-ai/complete}, client=[id: 0x8be7b38b, L:/100.72.151.77:6650 - R:/100.72.3.187:40938] [SR:100.72.3.187, state:Connected], producerName=apachepulsar-8-1, producerId=0}, role: null", "kubernetes": { "pod_name": "lication.var.log.containers.apachepulsar-broker-0", "namespace_name": "pulsar", "container_name": "apachepulsar-broker", "docker_id": "aa59e0205fd6c4da6f789ae3e2f6bbed277afd108a7d21d68e1241e096928fee" } }, "log": "2026-03-12T02:29:13,476+0000 [broker-topic-workers-OrderedExecutor-2-0] INFO org.apache.pulsar.broker.service.ServerCnx - [/100.72.3.187:40938] Created new producer: Producer{topic=PersistentTopic{topic=persistent://rgain/ice-ai/complete}, client=[id: 0x8be7b38b, L:/100.72.151.77:6650 - R:/100.72.3.187:40938] [SR:100.72.3.187, state:Connected], producerName=apachepulsar-8-1, producerId=0}, role: null" }, ``` ### Reproducing the issue 1. Produce messages to a persistent topic. 2. Consume using a worker on a Shared subscription. 3. Observe that after a certain period/event, the msgBacklog stays at 1 (or more), but msgRateOut drops to 0.0. 4. Verify via logs that the client is still sending/receiving heartbeats. ### Additional information "msgBacklog" : 1, "lastAckedTime" : "2026-03-12T02:29:13.734Z", "lastConsumedTime" : "2026-03-12T02:29:13.497Z" ### Are you willing to submit a PR? - [ ] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
