[
https://issues.apache.org/jira/browse/KAFKA-10007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117178#comment-17117178
]
Ismael Juma commented on KAFKA-10007:
-------------------------------------
Thanks for the report. Could it beĀ KAFKA-9543?
> Kafka consumer offset reset despite recent group activity
> ---------------------------------------------------------
>
> Key: KAFKA-10007
> URL: https://issues.apache.org/jira/browse/KAFKA-10007
> Project: Kafka
> Issue Type: Bug
> Reporter: Raman Gupta
> Priority: Major
>
> I was running a Kafka 2.3.0 broker with the default values for
> `offset.retention.minutes` (which should be 7 days as of 2.0.0). I deployed a
> 2.4.1 broker, along with a change in setting `offset.retention.minutes` to 14
> days, as I have several low-traffic topics in which exactly-once processing
> is desired.
> As I understand it, with https://issues.apache.org/jira/browse/KAFKA-4682 and
> KIP-211, offsets should no longer be expired based on the last commit
> timestamp, but instead on the last time the group transitioned into an Empty
> state.
> However, the behavior I saw from Kafka upon broker shutdown was that the
> offsets were expired for a group when as far as I can tell, they should not
> have been. See these logs from during the cluster recycle -- during this time
> the consumer, configured with the static group membership protocol, is always
> running:
> {code}
> <<Running Kafka 2.3.0, 4 brokers, all on 2.3, protocol version 2.3,
> offsets.retention.minutes using default value>>
> [2020-05-10 05:37:01,070] <<Shutting down kafka-0>>
> << Starting broker-0 on 2.4.1 with protocol version 2.3,
> offsets.retention.minutes = 10080 >>
> kafka-0 [2020-05-10 05:37:39,682] INFO starting
> (kafka.server.KafkaServer)
> kafka-0 [2020-05-10 05:39:42,680] INFO [GroupCoordinator 0]: Loading
> group metadata for produs-cis-CisFileEventConsumer with generation 27
> (kafka.coordinator.group.GroupCoordinator)
> << Recycling broker-1 on 2.4.1, protocol version 2.3,
> offsets.retention.minutes = 10080, looks like the consumer fails because of
> the broker going down, and kafka-0 reports: >>
> kafka-0 [2020-05-10 05:45:14,121] INFO [GroupCoordinator 0]: Member
> cis-9c5d994c5-7hpqt-efced5ca-0b81-4720-992d-bdd8612519b3 in group
> produs-cis-CisFileEventConsumer has failed, removing it from the group
> (kafka.coordinator.group.GroupCoordinator)
> kafka-0 [2020-05-10 05:45:14,124] INFO [GroupCoordinator 0]: Preparing
> to rebalance group produs-cis-CisFileEventConsumer in state
> PreparingRebalance with old generation 27 (__consumer_offsets-17) (reason:
> removing member cis-9c5d994c5-7hpqt-efced5ca-0b81-4720-992d-bdd8612519b3 on
> heartbeat expiration) (kafka.coordinator.group.GroupCoordinator)
> kafka-0 [2020-05-10 05:45:19,479] INFO [GroupCoordinator 0]: Member
> cis-9c5d994c5-sknlk-2b9ed8bf-348c-4a10-97d3-5f2caccce7df in group
> produs-cis-CisFileEventConsumer has failed, removing it from the group
> (kafka.coordinator.group.GroupCoordinator)
> kafka-0 [2020-05-10 05:45:19,482] INFO [GroupCoordinator 0]: Group
> produs-cis-CisFileEventConsumer with generation 28 is now empty
> (__consumer_offsets-17) (kafka.coordinator.group.GroupCoordinator)
> << and now kafka-1 starts up again, the offsets are expired >>
> kafka-1 [2020-05-10 05:46:11,229] INFO starting
> (kafka.server.KafkaServer)
> ...
> kafka-0 [2020-05-10 05:47:42,303] INFO [GroupCoordinator 0]: Preparing
> to rebalance group produs-cis-CisFileEventConsumer in state
> PreparingRebalance with old generation 28 (__consumer_offsets-17) (reason:
> Adding new member cis-9c5d994c5-sknlk-1194b4b6-81ae-4a78-89a7-c610cf8c65be
> with group instanceid Some(cis-9c5d994c5-sknlk))
> (kafka.coordinator.group.GroupCoordinator)
> kafka-0 [2020-05-10 05:47:47,611] INFO [GroupMetadataManager
> brokerId=0] Removed 43 expired offsets in 13 milliseconds.
> (kafka.coordinator.group.GroupMetadataManager)
> kafka-0 [2020-05-10 05:48:12,308] INFO [GroupCoordinator 0]: Stabilized
> group produs-cis-CisFileEventConsumer generation 29 (__consumer_offsets-17)
> (kafka.coordinator.group.GroupCoordinator)
> kafka-0 [2020-05-10 05:48:12,311] INFO [GroupCoordinator 0]: Assignment
> received from leader for group produs-cis-CisFileEventConsumer for generation
> 29 (kafka.coordinator.group.GroupCoordinator)
> {code}
> The group becomes empty at 2020-05-10 05:45:19,482, and then the offsets are
> expired about two minutes later at 05:47:47,611. I can't see any reason based
> on my understanding of how things work for this to have happened, other than
> it being a bug of some type?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)