[jira] [Commented] (KAFKA-17445) Kafka streams keeps rebalancing with the following reasons

Matthias J. Sax (Jira) Thu, 26 Sep 2024 17:38:28 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-17445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885185#comment-17885185
 ]


Matthias J. Sax commented on KAFKA-17445:
-----------------------------------------

Thanks for the feedback. Sounds you are making progress. – Maybe using standby 
tasks can help to fight the issue with spot instances?

> Kafka streams keeps rebalancing with the following reasons
> ----------------------------------------------------------
>
>                 Key: KAFKA-17445
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17445
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 3.8.0
>            Reporter: Rohit Bobade
>            Priority: Major
>
> We recently upgraded Kafka streams version to 3.8.0 and are seeing that the 
> streams app keeps rebalancing and does not process any events
> We have explicitly set the config 
> GROUP_INSTANCE_ID_CONFIG
> This is what we see on the broker logs:
> [GroupCoordinator 2]: Preparing to rebalance group \{consumer-group-name} in 
> state PreparingRebalance with old generation 24781 (__consumer_offsets-29) 
> (reason: Updating metadata for static member {} with instance id {}; client 
> reason: rebalance failed due to UnjoinedGroupException)
> We also tried to remove the GROUP_INSTANCE_ID_CONFIG but then see these logs 
> and rebalancing and no processing still
> sessionTimeoutMs=45000, rebalanceTimeoutMs=1800000, 
> supportedProtocols=List(stream)) has left group \{groupId} through explicit 
> `LeaveGroup`; client reason: the consumer unsubscribed from all topics 
> (kafka.coordinator.group.GroupCoordinator)
> other logs show:
> during Stable; client reason: need to revoke partitions and re-join)
> client reason: triggered followup rebalance scheduled for 0
> On the application logs we see:
> 1. state being restored from changelog topic
> 2. INFO org.apache.kafka.streams.processor.internals.StreamThread - 
> stream-thread  at state RUNNING: partitions  lost due to missed rebalance.
> Detected that the thread is being fenced. This implies that this thread 
> missed a rebalance and dropped out of the consumer group. Will close out all 
> assigned tasks and rejoin the consumer group.
>  
> 3. Task Migrated exceptions
> org.apache.kafka.streams.errors.TaskMigratedException: Error encountered 
> sending record to topic
> org.apache.kafka.common.errors.InvalidProducerEpochException: Producer with 
> transactionalId
> attempted to produce with an old epoch
> Written offsets would not be recorded and no more records would be sent since 
> the producer is fenced, indicating the task may be migrated out; it means all 
> tasks belonging to this thread should be migrated.
> at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:306)
>  ~[kafka-streams-3.8.0.jar:?]
> at 
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.lambda$send$1(RecordCollectorImpl.java:286)
>  ~[kafka-streams-3.8.0.jar:?]
> at 
> datadog.trace.instrumentation.kafka_clients.KafkaProducerCallback.onCompletion(KafkaProducerCallback.java:44)
>  ~[?:?]
> at 
> org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:1106)
>  ~[kafka-clients-3.8.0.jar:?]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-17445) Kafka streams keeps rebalancing with the following reasons

Reply via email to