Gao Fei created KAFKA-16430:
-------------------------------
Summary: The group-metadata-manager thread is always in a loading
state and occupies one CPU, unable to end.
Key: KAFKA-16430
URL: https://issues.apache.org/jira/browse/KAFKA-16430
Project: Kafka
Issue Type: Bug
Components: group-coordinator
Affects Versions: 2.4.0
Reporter: Gao Fei
I deployed three broker instances and suddenly found that the client was unable
to consume data from certain topic partitions. I first tried to log in to the
broker corresponding to the group and used the following command to view the
consumer group:
{code:java}
./bin/kafka-consumer-groups.sh --bootstrap-server localhost:9093 --describe
--group mygroup{code}
and found the following error:
{code:java}
Error: Executing consumer group command failed due to
org.apache.kafka.common.errors.CoodinatorLoadInProgressException: The
coodinator is loading and hence can't process requests.{code}
I then discovered that the broker may be stuck in a loop, which is constantly
in a loading state. At the same time, I found through the top command that the
"group-metadata-manager-0" thread was constantly consuming 100% of the CPU
resources. This loop could not be broken, resulting in the inability to consume
topic partition data on that node. At this point, I suspected that the issue
may be related to the __consumer_offsets partition data file loaded by this
thread.
Finally, after restarting the broker instance, everything was back to normal.
It's very strange that if there was an issue with the __consumer_offsets
partition data file, the broker should have failed to start. Why was it able to
automatically recover after a restart? And why did this continuous loop loading
of the __consumer_offsets partition data occur?
We encountered this issue in our production environment using Kafka versions
2.2.1 and 2.4.0, and I believe it may also affect other versions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)