blinkeye commented on PR #1898:
URL: https://github.com/apache/zookeeper/pull/1898#issuecomment-2226826413

   Thank you @luke-sterkowicz  for the effort and proposal. I've been observing 
the same issue.
   
   There's another reason for introducing this enhancement: it helps operating 
ZK. Currently if you have a cluster and quorum and remove one instance (while 
still maintaining quorum) you get a WARN along with a 
`java.lang.InterruptedException: null` exception on each instance, as well as 
an actual ERROR, which looks concerning.
   
   Here's an example of a 4 instances with quorum and a `kill instance-3` 
(which is what [bin/zkServer.sh 
stop](https://github.com/apache/zookeeper/blob/master/bin/zkServer.sh#L216-L227)
 is doing) and the corresponding logs. 
   
   The ERROR is an `Unexpected exception`:
   
   ```bash
   zookeeper-2  | 2024-07-13 08:21:58,033 [myid:] - ERROR 
[LearnerHandler-/192.168.64.4:47434:o.a.z.s.q.LearnerHandler@720] - Unexpected 
exception in LearnerHandler: 
   zookeeper-2  | java.io.EOFException: null
   ``` 
   
   This looks concerning and I would start a RCA. Even the WARN have a `null` 
exception pointing to an actual issue. As it turns out this happens every time 
an instance is removed.
   
   All the logs when `zookeeper-3` is removed, this is reproducible on 
[v3.9.2](https://github.com/apache/zookeeper/releases/tag/release-3.9.2).
   
   ```bash
   zookeeper-1  | 2024-07-13 08:21:58,033 [myid:] - WARN  
[RecvWorker:3:o.a.z.s.q.QuorumCnxManager$RecvWorker@1402] - Connection broken 
for id 3, my id = 1
   zookeeper-2  | 2024-07-13 08:21:58,033 [myid:] - WARN  
[RecvWorker:3:o.a.z.s.q.QuorumCnxManager$RecvWorker@1402] - Connection broken 
for id 3, my id = 2
   zookeeper-1  | java.io.EOFException: null
   zookeeper-4  | 2024-07-13 08:21:58,033 [myid:] - WARN  
[RecvWorker:3:o.a.z.s.q.QuorumCnxManager$RecvWorker@1402] - Connection broken 
for id 3, my id = 4
   zookeeper-2  | java.io.EOFException: null
   zookeeper-1  |  at java.base/java.io.DataInputStream.readInt(Unknown Source)
   zookeeper-4  | java.io.EOFException: null
   zookeeper-2  |  at java.base/java.io.DataInputStream.readInt(Unknown Source)
   zookeeper-1  |  at 
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1390)
   zookeeper-1  | 2024-07-13 08:21:58,033 [myid:] - WARN  
[RecvWorker:3:o.a.z.s.q.QuorumCnxManager$RecvWorker@1408] - Interrupting 
SendWorker thread from RecvWorker. sid: 3. myId: 1
   zookeeper-4  |  at java.base/java.io.DataInputStream.readInt(Unknown Source)
   zookeeper-4  |  at 
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1390)
   zookeeper-4  | 2024-07-13 08:21:58,033 [myid:] - WARN  
[RecvWorker:3:o.a.z.s.q.QuorumCnxManager$RecvWorker@1408] - Interrupting 
SendWorker thread from RecvWorker. sid: 3. myId: 4
   zookeeper-2  |  at 
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1390)
   zookeeper-2  | 2024-07-13 08:21:58,033 [myid:] - ERROR 
[LearnerHandler-/192.168.64.4:47434:o.a.z.s.q.LearnerHandler@720] - Unexpected 
exception in LearnerHandler: 
   zookeeper-2  | java.io.EOFException: null
   zookeeper-2  |  at java.base/java.io.DataInputStream.readInt(Unknown Source)
   zookeeper-2  |  at 
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:96)
   zookeeper-2  |  at 
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:86)
   zookeeper-2  |  at 
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:134)
   zookeeper-2  |  at 
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:657)
   zookeeper-2  | 2024-07-13 08:21:58,033 [myid:] - WARN  
[RecvWorker:3:o.a.z.s.q.QuorumCnxManager$RecvWorker@1408] - Interrupting 
SendWorker thread from RecvWorker. sid: 3. myId: 2
   zookeeper-1  | 2024-07-13 08:21:58,033 [myid:] - WARN  
[SendWorker:3:o.a.z.s.q.QuorumCnxManager$SendWorker@1288] - Interrupted while 
waiting for message on queue
   zookeeper-2  | 2024-07-13 08:21:58,033 [myid:] - INFO  
[LearnerHandler-/192.168.64.4:47434:o.a.z.s.q.LearnerHandler@1160] - 
Synchronously closing socket to learner 3.
   zookeeper-2  | 2024-07-13 08:21:58,033 [myid:] - WARN  
[LearnerHandler-/192.168.64.4:47434:o.a.z.s.q.LearnerHandler@736] - ******* 
GOODBYE /192.168.64.4:47434 ********
   zookeeper-1  | java.lang.InterruptedException: null
   zookeeper-1  |  at 
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown
 Source)
   zookeeper-1  |  at 
org.apache.zookeeper.util.CircularBlockingQueue.poll(CircularBlockingQueue.java:105)
   zookeeper-1  |  at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1453)
   zookeeper-2  | 2024-07-13 08:21:58,033 [myid:] - WARN  
[SendWorker:3:o.a.z.s.q.QuorumCnxManager$SendWorker@1288] - Interrupted while 
waiting for message on queue
   zookeeper-2  | java.lang.InterruptedException: null
   zookeeper-2  |  at 
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown
 Source)
   zookeeper-2  |  at 
org.apache.zookeeper.util.CircularBlockingQueue.poll(CircularBlockingQueue.java:105)
   zookeeper-2  |  at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1453)
   zookeeper-2  |  at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.access$900(QuorumCnxManager.java:99)
   zookeeper-2  |  at 
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1277)
   zookeeper-2  | 2024-07-13 08:21:58,033 [myid:] - WARN  
[SendWorker:3:o.a.z.s.q.QuorumCnxManager$SendWorker@1300] - Send worker leaving 
thread id 3 my id = 2
   zookeeper-1  |  at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.access$900(QuorumCnxManager.java:99)
   zookeeper-1  |  at 
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1277)
   zookeeper-1  | 2024-07-13 08:21:58,033 [myid:] - WARN  
[SendWorker:3:o.a.z.s.q.QuorumCnxManager$SendWorker@1300] - Send worker leaving 
thread id 3 my id = 1
   zookeeper-4  | 2024-07-13 08:21:58,033 [myid:] - WARN  
[SendWorker:3:o.a.z.s.q.QuorumCnxManager$SendWorker@1288] - Interrupted while 
waiting for message on queue
   zookeeper-4  | java.lang.InterruptedException: null
   zookeeper-4  |  at 
java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown
 Source)
   zookeeper-4  |  at 
org.apache.zookeeper.util.CircularBlockingQueue.poll(CircularBlockingQueue.java:105)
   zookeeper-4  |  at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1453)
   zookeeper-4  |  at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.access$900(QuorumCnxManager.java:99)
   zookeeper-4  |  at 
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1277)
   zookeeper-4  | 2024-07-13 08:21:58,033 [myid:] - WARN  
[SendWorker:3:o.a.z.s.q.QuorumCnxManager$SendWorker@1300] - Send worker leaving 
thread id 3 my id = 4
   zookeeper-3 exited with code 143
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to