[
https://issues.apache.org/jira/browse/KAFKA-7697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16808327#comment-16808327
]
little brother ma commented on KAFKA-7697:
------------------------------------------
we also hit the same issue with 2.1.1 !
Get the metric "kafka.network:type=RequestChannel,name=RequestQueueSize" value
is always 1000, and we config queued.max.requests=1000
kafka-network-thread-5-ListenerName(PLAINTEXT)-PLAINTEXT-4" #97 prio=5
os_prio=0 tid=0x00007fb7ce0ba800 nid=0x2d5 waiting on condition
[0x00007fad6e5f8000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000004530783a0> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353)
at kafka.network.RequestChannel.sendRequest(RequestChannel.scala:310)
at
kafka.network.Processor.$anonfun$processCompletedReceives$1(SocketServer.scala:709)
at
kafka.network.Processor.$anonfun$processCompletedReceives$1$adapted(SocketServer.scala:699)
at kafka.network.Processor$$Lambda$877/855310793.apply(Unknown Source)
at scala.collection.Iterator.foreach(Iterator.scala:937)
at scala.collection.Iterator.foreach$(Iterator.scala:937)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1425)
at scala.collection.IterableLike.foreach(IterableLike.scala:70)
at scala.collection.IterableLike.foreach$(IterableLike.scala:69)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at kafka.network.Processor.processCompletedReceives(SocketServer.scala:699)
at kafka.network.Processor.run(SocketServer.scala:595)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
"kafka-request-handler-15" #87 daemon prio=5 os_prio=0 tid=0x00007fb7ceee6800
nid=0x2cb waiting on condition [0x00007fad71af4000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000004540423f0> (a
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
at
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:249)
at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:257)
at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:729)
at
kafka.server.ReplicaManager.$anonfun$appendToLocalLog$2(ReplicaManager.scala:735)
at kafka.server.ReplicaManager$$Lambda$1567/915411568.apply(Unknown Source)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
at scala.collection.TraversableLike$$Lambda$12/811760110.apply(Unknown Source)
at scala.collection.immutable.Map$Map1.foreach(Map.scala:125)
at scala.collection.TraversableLike.map(TraversableLike.scala:233)
at scala.collection.TraversableLike.map$(TraversableLike.scala:226)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:723)
at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:470)
at
kafka.coordinator.group.GroupMetadataManager.appendForGroup(GroupMetadataManager.scala:280)
at
kafka.coordinator.group.GroupMetadataManager.storeOffsets(GroupMetadataManager.scala:423)
at
kafka.coordinator.group.GroupCoordinator.$anonfun$doCommitOffsets$1(GroupCoordinator.scala:518)
at
kafka.coordinator.group.GroupCoordinator$$Lambda$1816/513285617.apply$mcV$sp(Unknown
Source)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
at kafka.coordinator.group.GroupMetadata.inLock(GroupMetadata.scala:197)
at
kafka.coordinator.group.GroupCoordinator.doCommitOffsets(GroupCoordinator.scala:503)
at
kafka.coordinator.group.GroupCoordinator.handleCommitOffsets(GroupCoordinator.scala:482)
at kafka.server.KafkaApis.handleOffsetCommitRequest(KafkaApis.scala:365)
at kafka.server.KafkaApis.handle(KafkaApis.scala:114)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- <0x0000000794ea4248> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
> Possible deadlock in kafka.cluster.Partition
> --------------------------------------------
>
> Key: KAFKA-7697
> URL: https://issues.apache.org/jira/browse/KAFKA-7697
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 2.1.0
> Reporter: Gian Merlino
> Assignee: Rajini Sivaram
> Priority: Blocker
> Fix For: 2.2.0, 2.1.1
>
> Attachments: threaddump.txt
>
>
> After upgrading a fairly busy broker from 0.10.2.0 to 2.1.0, it locked up
> within a few minutes (by "locked up" I mean that all request handler threads
> were busy, and other brokers reported that they couldn't communicate with
> it). I restarted it a few times and it did the same thing each time. After
> downgrading to 0.10.2.0, the broker was stable. I attached a thread dump from
> the last attempt on 2.1.0 that shows lots of kafka-request-handler- threads
> trying to acquire the leaderIsrUpdateLock lock in kafka.cluster.Partition.
> It jumps out that there are two threads that already have some read lock
> (can't tell which one) and are trying to acquire a second one (on two
> different read locks: 0x0000000708184b88 and 0x000000070821f188):
> kafka-request-handler-1 and kafka-request-handler-4. Both are handling a
> produce request, and in the process of doing so, are calling
> Partition.fetchOffsetSnapshot while trying to complete a DelayedFetch. At the
> same time, both of those locks have writers from other threads waiting on
> them (kafka-request-handler-2 and kafka-scheduler-6). Neither of those locks
> appear to have writers that hold them (if only because no threads in the dump
> are deep enough in inWriteLock to indicate that).
> ReentrantReadWriteLock in nonfair mode prioritizes waiting writers over
> readers. Is it possible that kafka-request-handler-1 and
> kafka-request-handler-4 are each trying to read-lock the partition that is
> currently locked by the other one, and they're both parked waiting for
> kafka-request-handler-2 and kafka-scheduler-6 to get write locks, which they
> never will, because the former two threads own read locks and aren't giving
> them up?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)