[jira] [Commented] (KAFKA-7697) Possible deadlock in kafka.cluster.Partition

little brother ma (JIRA) Tue, 02 Apr 2019 20:49:31 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-7697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808327#comment-16808327
 ]


little brother ma commented on KAFKA-7697:
------------------------------------------

we also hit the same issue with 2.1.1 !

Get the metric "kafka.network:type=RequestChannel,name=RequestQueueSize" value 
is always 1000, and we config queued.max.requests=1000

 

kafka-network-thread-5-ListenerName(PLAINTEXT)-PLAINTEXT-4" #97 prio=5 
os_prio=0 tid=0x00007fb7ce0ba800 nid=0x2d5 waiting on condition 
[0x00007fad6e5f8000]
 java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for <0x00000004530783a0> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
 at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353)
 at kafka.network.RequestChannel.sendRequest(RequestChannel.scala:310)
 at 
kafka.network.Processor.$anonfun$processCompletedReceives$1(SocketServer.scala:709)
 at 
kafka.network.Processor.$anonfun$processCompletedReceives$1$adapted(SocketServer.scala:699)
 at kafka.network.Processor$$Lambda$877/855310793.apply(Unknown Source)
 at scala.collection.Iterator.foreach(Iterator.scala:937)
 at scala.collection.Iterator.foreach$(Iterator.scala:937)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
 at scala.collection.IterableLike.foreach(IterableLike.scala:70)
 at scala.collection.IterableLike.foreach$(IterableLike.scala:69)
 at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
 at kafka.network.Processor.processCompletedReceives(SocketServer.scala:699)
 at kafka.network.Processor.run(SocketServer.scala:595)
 at java.lang.Thread.run(Thread.java:748)

Locked ownable synchronizers:
 - None

 

"kafka-request-handler-15" #87 daemon prio=5 os_prio=0 tid=0x00007fb7ceee6800 
nid=0x2cb waiting on condition [0x00007fad71af4000]
 java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for <0x00000004540423f0> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
 at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
 at 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
 at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:249)
 at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:257)
 at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:729)
 at 
kafka.server.ReplicaManager.$anonfun$appendToLocalLog$2(ReplicaManager.scala:735)
 at kafka.server.ReplicaManager$$Lambda$1567/915411568.apply(Unknown Source)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
 at scala.collection.TraversableLike$$Lambda$12/811760110.apply(Unknown Source)
 at scala.collection.immutable.Map$Map1.foreach(Map.scala:125)
 at scala.collection.TraversableLike.map(TraversableLike.scala:233)
 at scala.collection.TraversableLike.map$(TraversableLike.scala:226)
 at scala.collection.AbstractTraversable.map(Traversable.scala:104)
 at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:723)
 at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:470)
 at 
kafka.coordinator.group.GroupMetadataManager.appendForGroup(GroupMetadataManager.scala:280)
 at 
kafka.coordinator.group.GroupMetadataManager.storeOffsets(GroupMetadataManager.scala:423)
 at 
kafka.coordinator.group.GroupCoordinator.$anonfun$doCommitOffsets$1(GroupCoordinator.scala:518)
 at 
kafka.coordinator.group.GroupCoordinator$$Lambda$1816/513285617.apply$mcV$sp(Unknown
 Source)
 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
 at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
 at kafka.coordinator.group.GroupMetadata.inLock(GroupMetadata.scala:197)
 at 
kafka.coordinator.group.GroupCoordinator.doCommitOffsets(GroupCoordinator.scala:503)
 at 
kafka.coordinator.group.GroupCoordinator.handleCommitOffsets(GroupCoordinator.scala:482)
 at kafka.server.KafkaApis.handleOffsetCommitRequest(KafkaApis.scala:365)
 at kafka.server.KafkaApis.handle(KafkaApis.scala:114)
 at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69)
 at java.lang.Thread.run(Thread.java:748)

Locked ownable synchronizers:
 - <0x0000000794ea4248> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

 

> Possible deadlock in kafka.cluster.Partition
> --------------------------------------------
>
>                 Key: KAFKA-7697
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7697
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 2.1.0
>            Reporter: Gian Merlino
>            Assignee: Rajini Sivaram
>            Priority: Blocker
>             Fix For: 2.2.0, 2.1.1
>
>         Attachments: threaddump.txt
>
>
> After upgrading a fairly busy broker from 0.10.2.0 to 2.1.0, it locked up 
> within a few minutes (by "locked up" I mean that all request handler threads 
> were busy, and other brokers reported that they couldn't communicate with 
> it). I restarted it a few times and it did the same thing each time. After 
> downgrading to 0.10.2.0, the broker was stable. I attached a thread dump from 
> the last attempt on 2.1.0 that shows lots of kafka-request-handler- threads 
> trying to acquire the leaderIsrUpdateLock lock in kafka.cluster.Partition.
> It jumps out that there are two threads that already have some read lock 
> (can't tell which one) and are trying to acquire a second one (on two 
> different read locks: 0x0000000708184b88 and 0x000000070821f188): 
> kafka-request-handler-1 and kafka-request-handler-4. Both are handling a 
> produce request, and in the process of doing so, are calling 
> Partition.fetchOffsetSnapshot while trying to complete a DelayedFetch. At the 
> same time, both of those locks have writers from other threads waiting on 
> them (kafka-request-handler-2 and kafka-scheduler-6). Neither of those locks 
> appear to have writers that hold them (if only because no threads in the dump 
> are deep enough in inWriteLock to indicate that).
> ReentrantReadWriteLock in nonfair mode prioritizes waiting writers over 
> readers. Is it possible that kafka-request-handler-1 and 
> kafka-request-handler-4 are each trying to read-lock the partition that is 
> currently locked by the other one, and they're both parked waiting for 
> kafka-request-handler-2 and kafka-scheduler-6 to get write locks, which they 
> never will, because the former two threads own read locks and aren't giving 
> them up?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (KAFKA-7697) Possible deadlock in kafka.cluster.Partition

Reply via email to