[
https://issues.apache.org/jira/browse/KAFKA-17445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885185#comment-17885185
]
Matthias J. Sax commented on KAFKA-17445:
-----------------------------------------
Thanks for the feedback. Sounds you are making progress. – Maybe using standby
tasks can help to fight the issue with spot instances?
> Kafka streams keeps rebalancing with the following reasons
> ----------------------------------------------------------
>
> Key: KAFKA-17445
> URL: https://issues.apache.org/jira/browse/KAFKA-17445
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 3.8.0
> Reporter: Rohit Bobade
> Priority: Major
>
> We recently upgraded Kafka streams version to 3.8.0 and are seeing that the
> streams app keeps rebalancing and does not process any events
> We have explicitly set the config
> GROUP_INSTANCE_ID_CONFIG
> This is what we see on the broker logs:
> [GroupCoordinator 2]: Preparing to rebalance group \{consumer-group-name} in
> state PreparingRebalance with old generation 24781 (__consumer_offsets-29)
> (reason: Updating metadata for static member {} with instance id {}; client
> reason: rebalance failed due to UnjoinedGroupException)
> We also tried to remove the GROUP_INSTANCE_ID_CONFIG but then see these logs
> and rebalancing and no processing still
> sessionTimeoutMs=45000, rebalanceTimeoutMs=1800000,
> supportedProtocols=List(stream)) has left group \{groupId} through explicit
> `LeaveGroup`; client reason: the consumer unsubscribed from all topics
> (kafka.coordinator.group.GroupCoordinator)
> other logs show:
> during Stable; client reason: need to revoke partitions and re-join)
> client reason: triggered followup rebalance scheduled for 0
> On the application logs we see:
> 1. state being restored from changelog topic
> 2. INFO org.apache.kafka.streams.processor.internals.StreamThread -
> stream-thread at state RUNNING: partitions lost due to missed rebalance.
> Detected that the thread is being fenced. This implies that this thread
> missed a rebalance and dropped out of the consumer group. Will close out all
> assigned tasks and rejoin the consumer group.
>
> 3. Task Migrated exceptions
> org.apache.kafka.streams.errors.TaskMigratedException: Error encountered
> sending record to topic
> org.apache.kafka.common.errors.InvalidProducerEpochException: Producer with
> transactionalId
> attempted to produce with an old epoch
> Written offsets would not be recorded and no more records would be sent since
> the producer is fenced, indicating the task may be migrated out; it means all
> tasks belonging to this thread should be migrated.
> at
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:306)
> ~[kafka-streams-3.8.0.jar:?]
> at
> org.apache.kafka.streams.processor.internals.RecordCollectorImpl.lambda$send$1(RecordCollectorImpl.java:286)
> ~[kafka-streams-3.8.0.jar:?]
> at
> datadog.trace.instrumentation.kafka_clients.KafkaProducerCallback.onCompletion(KafkaProducerCallback.java:44)
> ~[?:?]
> at
> org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:1106)
> ~[kafka-clients-3.8.0.jar:?]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)