[
https://issues.apache.org/jira/browse/KAFKA-9153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tristan updated KAFKA-9153:
---------------------------
Description:
We upgraded our kafka brokers on many clusters to 2.1.1 a few month ago and did
not encountered any issues until a few weeks ago.
Now we have kafka service stopping with errors time to time, and we couldn't
establish correlation with any particular events, messages, or cluster
operations.
Here is the encountered errors :
kafka logs :
{code:bash}
[2019-11-06 11:19:36,177] ERROR [KafkaApi-5] Error while responding to offset
request (kafka.server.KafkaApis)scala.MatchError: null at
kafka.cluster.Partition.$anonfun$fetchOffsetForTimestamp$1(Partition.scala:813)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251) at
kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:257) at
kafka.cluster.Partition.fetchOffsetForTimestamp(Partition.scala:809) at
kafka.server.ReplicaManager.fetchOffsetForTimestamp(ReplicaManager.scala:784)
at
kafka.server.KafkaApis.$anonfun$handleListOffsetRequestV1AndAbove$3(KafkaApis.scala:833)
at
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
at scala.collection.Iterator.foreach(Iterator.scala:937) at
scala.collection.Iterator.foreach$(Iterator.scala:937) at
scala.collection.AbstractIterator.foreach(Iterator.scala:1425) at
scala.collection.IterableLike.foreach(IterableLike.scala:70) at
scala.collection.IterableLike.foreach$(IterableLike.scala:69) at
scala.collection.AbstractIterable.foreach(Iterable.scala:54) at
scala.collection.TraversableLike.map(TraversableLike.scala:233) at
scala.collection.TraversableLike.map$(TraversableLike.scala:226) at
scala.collection.AbstractTraversable.map(Traversable.scala:104) at
kafka.server.KafkaApis.handleListOffsetRequestV1AndAbove(KafkaApis.scala:813)
at kafka.server.KafkaApis.handleListOffsetRequest(KafkaApis.scala:753) at
kafka.server.KafkaApis.handle(KafkaApis.scala:108) at
kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) at
java.lang.Thread.run(Thread.java:745)[2019-11-06 11:19:36,178] ERROR
[KafkaApi-5] Error while responding to offset request
(kafka.server.KafkaApis)scala.MatchError: null at
kafka.cluster.Partition.$anonfun$fetchOffsetForTimestamp$1(Partition.scala:813)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251) at
kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:257) at
kafka.cluster.Partition.fetchOffsetForTimestamp(Partition.scala:809) at
kafka.server.ReplicaManager.fetchOffsetForTimestamp(ReplicaManager.scala:784)
at
kafka.server.KafkaApis.$anonfun$handleListOffsetRequestV1AndAbove$3(KafkaApis.scala:833)
at
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
at scala.collection.Iterator.foreach(Iterator.scala:937) at
scala.collection.Iterator.foreach$(Iterator.scala:937) at
scala.collection.AbstractIterator.foreach(Iterator.scala:1425) at
scala.collection.IterableLike.foreach(IterableLike.scala:70) at
scala.collection.IterableLike.foreach$(IterableLike.scala:69) at
scala.collection.AbstractIterable.foreach(Iterable.scala:54) at
scala.collection.TraversableLike.map(TraversableLike.scala:233) at
scala.collection.TraversableLike.map$(TraversableLike.scala:226) at
scala.collection.AbstractTraversable.map(Traversable.scala:104) at
kafka.server.KafkaApis.handleListOffsetRequestV1AndAbove(KafkaApis.scala:813)
at kafka.server.KafkaApis.handleListOffsetRequest(KafkaApis.scala:753) at
kafka.server.KafkaApis.handle(KafkaApis.scala:108) at
kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) at
java.lang.Thread.run(Thread.java:745)[2019-11-06 11:19:36,302] ERROR
[KafkaApi-5] Error while responding to offset request
(kafka.server.KafkaApis)scala.MatchError: null{code}
with 106 consecutive occurences of this stacktrace.
and this error showing with journalctl -u kafka, just after latest stacktrace
in kafka.log :
{code:bash}
Oct 15 14:34:32 kafka-5 systemd[1]: Started kafka daemon. Nov 06 11:19:50
kafka-5 bash[15874]: # Nov 06 11:19:50 kafka-5 bash[15874]: # A fatal error has
been detected by the Java Runtime Environment: Nov 06 11:19:50 kafka-5
bash[15874]: # Nov 06 11:19:50 kafka-5 bash[15874]: # SIGSEGV (0xb) at
pc=0x00007f88f31b8df0, pid=15874, tid=140216143816448 Nov 06 11:19:50 kafka-5
bash[15874]: # Nov 06 11:19:50 kafka-5 bash[15874]: # JRE version: Java(TM) SE
Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14) Nov 06 11:19:50 kafka-5
bash[15874]: # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.91-b14 mixed mode
linux-amd64 compressed oops) Nov 06 11:19:50 kafka-5 bash[15874]: # Problematic
frame: Nov 06 11:19:50 kafka-5 bash[15874]: # J 14984 C2
kafka.cluster.Partition.$anonfun$fetchOffsetForTimestamp$1(Lkafka/cluster/Partition;JLscala/Option;Ljava/util/Optional;Z)Lkafka/log/TimestampOffset;
(240 bytes) Nov 06 11:19:50 kafka-5 bash[15874]: # Nov 06 11:19:50 kafka-5
bash[15874]: # Failed to write core dump. Core dumps have been disabled. To
enable core dumping, try "ulimit -c unlimited" before starting Java again Nov
06 11:19:50 kafka-5 bash[15874]: # Nov 06 11:19:50 kafka-5 bash[15874]: # An
error report file with more information is saved as: Nov 06 11:19:50 kafka-5
bash[15874]: # /tmp/hs_err_pid15874.log Nov 06 11:19:50 kafka-5 bash[15874]: #
{code}
hs_err_pid15874.log header :
{code:bash}
## A fatal error has been detected by the Java Runtime Environment:## SIGSEGV
(0xb) at pc=0x00007f88f31b8df0, pid=15874, tid=140216143816448## JRE version:
Java(TM) SE Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)# Java VM:
Java HotSpot(TM) 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 compressed
oops)# Problematic frame:# J 14984 C2
kafka.cluster.Partition.$anonfun$fetchOffsetForTimestamp$1(Lkafka/cluster/Partition;JLscala/Option;Ljava/util/Optional;Z)Lkafka/log/TimestampOffset;
(240 bytes) @ 0x00007f88f31b8df0 [0x00007f88f31b8d80+0x70]## Failed to write
core dump. Core dumps have been disabled. To enable core dumping, try "ulimit
-c unlimited" before starting Java again## If you would like to submit a bug
report, please visit:# http://bugreport.java.com/bugreport/crash.jsp#
{code}
Except these errors, which require a broker restart when it happens, our
clusters are operating normally.
I came across old Issues that seem now resolved since version 0.10.1.1
(https://issues.apache.org/jira/browse/KAFKA-4205) but didn't find useful leads
there
I probably need to specify that we use kafka cruise control on our clusters to
manage partition rebalancing, but as specified before, this issue does not
necessarily occurs when a rebalancing operation is ongoing. It can happen at
any time.
Thanks
was:
We upgraded our kafka brokers on many clusters to 2.1.1 a few month ago and did
not encountered any issues until a few weeks ago.
Now we have kafka service stopping with errors time to time, and we couldn't
establish correlation with any particular events, messages, or cluster
operations.
Here is the encountered errors :
kafka logs :
{code:java}
[2019-11-06 11:19:36,177] ERROR [KafkaApi-5] Error while responding to offset
request (kafka.server.KafkaApis)scala.MatchError: null at
kafka.cluster.Partition.$anonfun$fetchOffsetForTimestamp$1(Partition.scala:813)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251) at
kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:257) at
kafka.cluster.Partition.fetchOffsetForTimestamp(Partition.scala:809) at
kafka.server.ReplicaManager.fetchOffsetForTimestamp(ReplicaManager.scala:784)
at
kafka.server.KafkaApis.$anonfun$handleListOffsetRequestV1AndAbove$3(KafkaApis.scala:833)
at
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
at scala.collection.Iterator.foreach(Iterator.scala:937) at
scala.collection.Iterator.foreach$(Iterator.scala:937) at
scala.collection.AbstractIterator.foreach(Iterator.scala:1425) at
scala.collection.IterableLike.foreach(IterableLike.scala:70) at
scala.collection.IterableLike.foreach$(IterableLike.scala:69) at
scala.collection.AbstractIterable.foreach(Iterable.scala:54) at
scala.collection.TraversableLike.map(TraversableLike.scala:233) at
scala.collection.TraversableLike.map$(TraversableLike.scala:226) at
scala.collection.AbstractTraversable.map(Traversable.scala:104) at
kafka.server.KafkaApis.handleListOffsetRequestV1AndAbove(KafkaApis.scala:813)
at kafka.server.KafkaApis.handleListOffsetRequest(KafkaApis.scala:753) at
kafka.server.KafkaApis.handle(KafkaApis.scala:108) at
kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) at
java.lang.Thread.run(Thread.java:745)[2019-11-06 11:19:36,178] ERROR
[KafkaApi-5] Error while responding to offset request
(kafka.server.KafkaApis)scala.MatchError: null at
kafka.cluster.Partition.$anonfun$fetchOffsetForTimestamp$1(Partition.scala:813)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251) at
kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:257) at
kafka.cluster.Partition.fetchOffsetForTimestamp(Partition.scala:809) at
kafka.server.ReplicaManager.fetchOffsetForTimestamp(ReplicaManager.scala:784)
at
kafka.server.KafkaApis.$anonfun$handleListOffsetRequestV1AndAbove$3(KafkaApis.scala:833)
at
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
at scala.collection.Iterator.foreach(Iterator.scala:937) at
scala.collection.Iterator.foreach$(Iterator.scala:937) at
scala.collection.AbstractIterator.foreach(Iterator.scala:1425) at
scala.collection.IterableLike.foreach(IterableLike.scala:70) at
scala.collection.IterableLike.foreach$(IterableLike.scala:69) at
scala.collection.AbstractIterable.foreach(Iterable.scala:54) at
scala.collection.TraversableLike.map(TraversableLike.scala:233) at
scala.collection.TraversableLike.map$(TraversableLike.scala:226) at
scala.collection.AbstractTraversable.map(Traversable.scala:104) at
kafka.server.KafkaApis.handleListOffsetRequestV1AndAbove(KafkaApis.scala:813)
at kafka.server.KafkaApis.handleListOffsetRequest(KafkaApis.scala:753) at
kafka.server.KafkaApis.handle(KafkaApis.scala:108) at
kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) at
java.lang.Thread.run(Thread.java:745)[2019-11-06 11:19:36,302] ERROR
[KafkaApi-5] Error while responding to offset request
(kafka.server.KafkaApis)scala.MatchError: null{code}
with 106 consecutive occurences of this stacktrace.
and this error showing with journalctl -u kafka, just after latest stacktrace
in kafka.log :
{code:java}
Oct 15 14:34:32 kafka-5 systemd[1]: Started kafka daemon. Nov 06 11:19:50
kafka-5 bash[15874]: # Nov 06 11:19:50 kafka-5 bash[15874]: # A fatal error has
been detected by the Java Runtime Environment: Nov 06 11:19:50 kafka-5
bash[15874]: # Nov 06 11:19:50 kafka-5 bash[15874]: # SIGSEGV (0xb) at
pc=0x00007f88f31b8df0, pid=15874, tid=140216143816448 Nov 06 11:19:50 kafka-5
bash[15874]: # Nov 06 11:19:50 kafka-5 bash[15874]: # JRE version: Java(TM) SE
Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14) Nov 06 11:19:50 kafka-5
bash[15874]: # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.91-b14 mixed mode
linux-amd64 compressed oops) Nov 06 11:19:50 kafka-5 bash[15874]: # Problematic
frame: Nov 06 11:19:50 kafka-5 bash[15874]: # J 14984 C2
kafka.cluster.Partition.$anonfun$fetchOffsetForTimestamp$1(Lkafka/cluster/Partition;JLscala/Option;Ljava/util/Optional;Z)Lkafka/log/TimestampOffset;
(240 bytes) Nov 06 11:19:50 kafka-5 bash[15874]: # Nov 06 11:19:50 kafka-5
bash[15874]: # Failed to write core dump. Core dumps have been disabled. To
enable core dumping, try "ulimit -c unlimited" before starting Java again Nov
06 11:19:50 kafka-5 bash[15874]: # Nov 06 11:19:50 kafka-5 bash[15874]: # An
error report file with more information is saved as: Nov 06 11:19:50 kafka-5
bash[15874]: # /tmp/hs_err_pid15874.log Nov 06 11:19:50 kafka-5 bash[15874]: #
{code}
hs_err_pid15874.log header :
{code:java}
## A fatal error has been detected by the Java Runtime Environment:## SIGSEGV
(0xb) at pc=0x00007f88f31b8df0, pid=15874, tid=140216143816448## JRE version:
Java(TM) SE Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)# Java VM:
Java HotSpot(TM) 64-Bit Server VM (25.91-b14 mixed mode linux-amd64 compressed
oops)# Problematic frame:# J 14984 C2
kafka.cluster.Partition.$anonfun$fetchOffsetForTimestamp$1(Lkafka/cluster/Partition;JLscala/Option;Ljava/util/Optional;Z)Lkafka/log/TimestampOffset;
(240 bytes) @ 0x00007f88f31b8df0 [0x00007f88f31b8d80+0x70]## Failed to write
core dump. Core dumps have been disabled. To enable core dumping, try "ulimit
-c unlimited" before starting Java again## If you would like to submit a bug
report, please visit:# http://bugreport.java.com/bugreport/crash.jsp#
{code}
Except these errors, which require a broker restart when it happens, our
clusters are operating normally.
I came across old Issues that seem now resolved since version 0.10.1.1
(https://issues.apache.org/jira/browse/KAFKA-4205) but didn't find useful leads
there
I probably need to specify that we use kafka cruise control on our clusters to
manage partition rebalancing, but as specified before, this issue does not
necessarily occurs when a rebalancing operation is ongoing. It can happen at
any time.
Thanks
> Kafka brokers randomly crash (SIGSEGV due to kafka errors)
> ----------------------------------------------------------
>
> Key: KAFKA-9153
> URL: https://issues.apache.org/jira/browse/KAFKA-9153
> Project: Kafka
> Issue Type: Bug
> Reporter: Tristan
> Priority: Major
>
> We upgraded our kafka brokers on many clusters to 2.1.1 a few month ago and
> did not encountered any issues until a few weeks ago.
> Now we have kafka service stopping with errors time to time, and we couldn't
> establish correlation with any particular events, messages, or cluster
> operations.
> Here is the encountered errors :
> kafka logs :
> {code:bash}
> [2019-11-06 11:19:36,177] ERROR [KafkaApi-5] Error while responding to offset
> request (kafka.server.KafkaApis)scala.MatchError: null at
> kafka.cluster.Partition.$anonfun$fetchOffsetForTimestamp$1(Partition.scala:813)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251) at
> kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:257) at
> kafka.cluster.Partition.fetchOffsetForTimestamp(Partition.scala:809) at
> kafka.server.ReplicaManager.fetchOffsetForTimestamp(ReplicaManager.scala:784)
> at
> kafka.server.KafkaApis.$anonfun$handleListOffsetRequestV1AndAbove$3(KafkaApis.scala:833)
> at
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
> at scala.collection.Iterator.foreach(Iterator.scala:937) at
> scala.collection.Iterator.foreach$(Iterator.scala:937) at
> scala.collection.AbstractIterator.foreach(Iterator.scala:1425) at
> scala.collection.IterableLike.foreach(IterableLike.scala:70) at
> scala.collection.IterableLike.foreach$(IterableLike.scala:69) at
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at
> scala.collection.TraversableLike.map(TraversableLike.scala:233) at
> scala.collection.TraversableLike.map$(TraversableLike.scala:226) at
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at
> kafka.server.KafkaApis.handleListOffsetRequestV1AndAbove(KafkaApis.scala:813)
> at kafka.server.KafkaApis.handleListOffsetRequest(KafkaApis.scala:753)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:108) at
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) at
> java.lang.Thread.run(Thread.java:745)[2019-11-06 11:19:36,178] ERROR
> [KafkaApi-5] Error while responding to offset request
> (kafka.server.KafkaApis)scala.MatchError: null at
> kafka.cluster.Partition.$anonfun$fetchOffsetForTimestamp$1(Partition.scala:813)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251) at
> kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:257) at
> kafka.cluster.Partition.fetchOffsetForTimestamp(Partition.scala:809) at
> kafka.server.ReplicaManager.fetchOffsetForTimestamp(ReplicaManager.scala:784)
> at
> kafka.server.KafkaApis.$anonfun$handleListOffsetRequestV1AndAbove$3(KafkaApis.scala:833)
> at
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:233)
> at scala.collection.Iterator.foreach(Iterator.scala:937) at
> scala.collection.Iterator.foreach$(Iterator.scala:937) at
> scala.collection.AbstractIterator.foreach(Iterator.scala:1425) at
> scala.collection.IterableLike.foreach(IterableLike.scala:70) at
> scala.collection.IterableLike.foreach$(IterableLike.scala:69) at
> scala.collection.AbstractIterable.foreach(Iterable.scala:54) at
> scala.collection.TraversableLike.map(TraversableLike.scala:233) at
> scala.collection.TraversableLike.map$(TraversableLike.scala:226) at
> scala.collection.AbstractTraversable.map(Traversable.scala:104) at
> kafka.server.KafkaApis.handleListOffsetRequestV1AndAbove(KafkaApis.scala:813)
> at kafka.server.KafkaApis.handleListOffsetRequest(KafkaApis.scala:753)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:108) at
> kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69) at
> java.lang.Thread.run(Thread.java:745)[2019-11-06 11:19:36,302] ERROR
> [KafkaApi-5] Error while responding to offset request
> (kafka.server.KafkaApis)scala.MatchError: null{code}
> with 106 consecutive occurences of this stacktrace.
> and this error showing with journalctl -u kafka, just after latest stacktrace
> in kafka.log :
> {code:bash}
> Oct 15 14:34:32 kafka-5 systemd[1]: Started kafka daemon. Nov 06 11:19:50
> kafka-5 bash[15874]: # Nov 06 11:19:50 kafka-5 bash[15874]: # A fatal error
> has been detected by the Java Runtime Environment: Nov 06 11:19:50 kafka-5
> bash[15874]: # Nov 06 11:19:50 kafka-5 bash[15874]: # SIGSEGV (0xb) at
> pc=0x00007f88f31b8df0, pid=15874, tid=140216143816448 Nov 06 11:19:50 kafka-5
> bash[15874]: # Nov 06 11:19:50 kafka-5 bash[15874]: # JRE version: Java(TM)
> SE Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14) Nov 06 11:19:50
> kafka-5 bash[15874]: # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.91-b14
> mixed mode linux-amd64 compressed oops) Nov 06 11:19:50 kafka-5 bash[15874]:
> # Problematic frame: Nov 06 11:19:50 kafka-5 bash[15874]: # J 14984 C2
> kafka.cluster.Partition.$anonfun$fetchOffsetForTimestamp$1(Lkafka/cluster/Partition;JLscala/Option;Ljava/util/Optional;Z)Lkafka/log/TimestampOffset;
> (240 bytes) Nov 06 11:19:50 kafka-5 bash[15874]: # Nov 06 11:19:50 kafka-5
> bash[15874]: # Failed to write core dump. Core dumps have been disabled. To
> enable core dumping, try "ulimit -c unlimited" before starting Java again Nov
> 06 11:19:50 kafka-5 bash[15874]: # Nov 06 11:19:50 kafka-5 bash[15874]: # An
> error report file with more information is saved as: Nov 06 11:19:50 kafka-5
> bash[15874]: # /tmp/hs_err_pid15874.log Nov 06 11:19:50 kafka-5 bash[15874]: #
> {code}
> hs_err_pid15874.log header :
> {code:bash}
> ## A fatal error has been detected by the Java Runtime Environment:##
> SIGSEGV (0xb) at pc=0x00007f88f31b8df0, pid=15874, tid=140216143816448## JRE
> version: Java(TM) SE Runtime Environment (8.0_91-b14) (build 1.8.0_91-b14)#
> Java VM: Java HotSpot(TM) 64-Bit Server VM (25.91-b14 mixed mode linux-amd64
> compressed oops)# Problematic frame:# J 14984 C2
> kafka.cluster.Partition.$anonfun$fetchOffsetForTimestamp$1(Lkafka/cluster/Partition;JLscala/Option;Ljava/util/Optional;Z)Lkafka/log/TimestampOffset;
> (240 bytes) @ 0x00007f88f31b8df0 [0x00007f88f31b8d80+0x70]## Failed to write
> core dump. Core dumps have been disabled. To enable core dumping, try "ulimit
> -c unlimited" before starting Java again## If you would like to submit a bug
> report, please visit:# http://bugreport.java.com/bugreport/crash.jsp#
> {code}
> Except these errors, which require a broker restart when it happens, our
> clusters are operating normally.
> I came across old Issues that seem now resolved since version 0.10.1.1
> (https://issues.apache.org/jira/browse/KAFKA-4205) but didn't find useful
> leads there
> I probably need to specify that we use kafka cruise control on our clusters
> to manage partition rebalancing, but as specified before, this issue does not
> necessarily occurs when a rebalancing operation is ongoing. It can happen at
> any time.
> Thanks
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)