[
https://issues.apache.org/jira/browse/KAFKA-9801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ismael Juma updated KAFKA-9801:
-------------------------------
Priority: Critical (was: Major)
> Static member could get empty assignment unexpectedly
> -----------------------------------------------------
>
> Key: KAFKA-9801
> URL: https://issues.apache.org/jira/browse/KAFKA-9801
> Project: Kafka
> Issue Type: Bug
> Components: consumer, streams
> Affects Versions: 2.4.0
> Reporter: Guozhang Wang
> Assignee: Guozhang Wang
> Priority: Critical
> Fix For: 2.5.0
>
>
> Take the following example trace where static members are joining the group:
> 1. Static member with instance A joined the group with empty member, the
> coordinator generated member.id 1 for A and added it to the group. The group
> state is PreparingRebalance.
> 2. The group is formed and now we move on to CompletingRebalance.
> 3. Another member joins the group, causing it to transit back to
> PreparingRebalance, which would potentially send a REBALANCE_IN_PROGRESS to
> member A as well.
> 4. Member A gets the REBALANCE_IN_PROGRESS error, trying to re-join (again
> with an empty member.id)
> 5. The group is now advanced to CompletingRebalance again.
> 6. The group get the second join-group from the known instance A with an
> empty member.id, will generated a new member.id 2 and replace the member.id 1.
> 7. The group gets the assignment from leader which only includes member.id 1
> and not member.id 2.
> 8. The assignment for member.id 1 is dropped on the broker side while the
> assignment for member.id 2 is set to an empty byte array.
> 9. The empty byte array is sent back to the instance A causing it the
> following error:
> {code}
> [2020-03-27T21:13:01-05:00]
> (streams-soak-2-5_soak_i-054b83e98b7ed6285_streamslog)
> org.apache.kafka.common.protocol.types.SchemaException: Error reading field
> 'version': java.nio.BufferUnderflowException
> at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:110)
> {code}
> This error has to be triggered when quite a few cases are aligned together,
> and hence it was not triggered very frequently.
> Personally I think there's a correlation with this error to the observed
> https://issues.apache.org/jira/browse/KAFKA-9659 as well, which I'd keep
> investigating (will update in this ticket).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)