[jira] [Created] (KAFKA-16430) The group-metadata-manager thread is always in a loading state and occupies one CPU, unable to end.

Gao Fei (Jira) Wed, 27 Mar 2024 00:01:16 -0700

Gao Fei created KAFKA-16430:
-------------------------------

             Summary: The group-metadata-manager thread is always in a loading 
state and occupies one CPU, unable to end.
                 Key: KAFKA-16430
                 URL: https://issues.apache.org/jira/browse/KAFKA-16430
             Project: Kafka
          Issue Type: Bug
          Components: group-coordinator
    Affects Versions: 2.4.0
            Reporter: Gao Fei



I deployed three broker instances and suddenly found that the client was unable 
to consume data from certain topic partitions. I first tried to log in to the 
broker corresponding to the group and used the following command to view the 
consumer group:
{code:java}
./bin/kafka-consumer-groups.sh --bootstrap-server localhost:9093 --describe 
--group mygroup{code}
and found the following error:
{code:java}
Error: Executing consumer group command failed due to 
org.apache.kafka.common.errors.CoodinatorLoadInProgressException: The 
coodinator is loading and hence can't process requests.{code}

I then discovered that the broker may be stuck in a loop, which is constantly 
in a loading state. At the same time, I found through the top command that the 
"group-metadata-manager-0" thread was constantly consuming 100% of the CPU 
resources. This loop could not be broken, resulting in the inability to consume 
topic partition data on that node. At this point, I suspected that the issue 
may be related to the __consumer_offsets partition data file loaded by this 
thread.
Finally, after restarting the broker instance, everything was back to normal. 
It's very strange that if there was an issue with the __consumer_offsets 
partition data file, the broker should have failed to start. Why was it able to 
automatically recover after a restart? And why did this continuous loop loading 
of the __consumer_offsets partition data occur?

We encountered this issue in our production environment using Kafka versions 
2.2.1 and 2.4.0, and I believe it may also affect other versions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-16430) The group-metadata-manager thread is always in a loading state and occupies one CPU, unable to end.

Reply via email to