[
https://issues.apache.org/jira/browse/KAFKA-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869723#comment-17869723
]
Dongnuo Lyu edited comment on KAFKA-17219 at 7/30/24 7:53 PM:
--------------------------------------------------------------
??it's really odd to me that we're still seeing these??
-Yeah at least in `consumer_test` the fix is missing. We can add them back when
AK is unblocked.-
Oh we do have {{wait_until}} fixes, it's just missing in
{{test_consumer_bounce}} and {{test_broker_rolling_bounce}}. It should also
fixes the partition_owner assertion.
was (Author: JIRAUSER302289):
> it's really odd to me that we're still seeing these
Yeah at least in `consumer_test` the fix is missing. We can add them back when
AK is unblocked.
> Adjust system test framework for new protocol consumer
> ------------------------------------------------------
>
> Key: KAFKA-17219
> URL: https://issues.apache.org/jira/browse/KAFKA-17219
> Project: Kafka
> Issue Type: Task
> Components: clients, consumer, system tests
> Reporter: Dongnuo Lyu
> Priority: Major
> Labels: kip-848-client-support
>
> The current test framework doesn't work well with the existing tests using
> the new consumer protocol. There are two main issues I've seen.
>
> First, we sometimes assume there is no rebalance triggered, for instance in
> {{consumer_test.py::test_consumer_failure}}
> {code:java}
> verify that there were no rebalances on failover
> assert num_rebalances == consumer.num_rebalances(), "Broker failure should
> not cause a rebalance"{code}
> The current frame work calculates {{num_rebalances}} by increment by one
> every time a new assignment is received, so if a reconciliation happened
> during the failover, {{num_rebalances}} will also be incremented. For new
> protocol we need a new way to update {{{}num_rebalances{}}}.
>
> Second, for the new protocol, we need a way to make sure all members have
> joined {*}and stablized{*}. Currently we only make sure all members have
> joined (the event handlers are all in Joined state), where some partitions
> haven't been assigned and more time is needed for reconciliation. The issue
> can cause failure in assertions like timeout waiting for consumption and
> {code:java}
> partition_owner = consumer.owner(partition)
> assert partition_owner is not None {code}
>
> For a short term solution, we can make the tests pass by bypassing with
> adding {{time.sleep}} or skip checking {{{}num_rebalance{}}}. To truly fix
> them, we should adjust
> {{tools/src/main/java/org/apache/kafka/tools/VerifiableConsumer.java}} to
> work well with the new protocol.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)