[
https://issues.apache.org/jira/browse/KAFKA-16474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17837342#comment-17837342
]
Philip Nee commented on KAFKA-16474:
------------------------------------
The log was attached:
See line 7481 and 7492 in :
{code:java}
AssignmentValidationTest/test_valid_assignment/metadata_quorum=ISOLATED_KRAFT.use_new_coordinator=True.group_protocol=consumer.group_remote_assignor=uniform/2/VerifiableConsumer-0-281473320420544/ducker11/verifiable_consumer.log
{code}
{code:java}
7480 [2024-04-15 16:03:35,964] TRACE [Consumer
clientId=consumer-test_group_id-1, groupId=test_group_id] Found least loaded
node ducker10:9092 (id: 2 rack: null) connected with no in-flight requests
(org.apache.kafka.clients.NetworkClient)
7481 [2024-04-15 16:03:35,964] DEBUG [Consumer
clientId=consumer-test_group_id-1, groupId=test_group_id] Sending
CONSUMER_GROUP_HEARTBEAT request with header
RequestHeader(apiKey=CONSUMER_GROUP_HEARTBEAT, apiVersion=0,
clientId=consumer-test_gro up_id-1, correlationId=108, headerVersion=2) and
timeout 30000 to node 2147483646:
ConsumerGroupHeartbeatRequestData(groupId='test_group_id', memberId='',
memberEpoch=0, instanceId=null, rackId=null, rebalanceTimeoutMs=300000,
subscribedTop icNames=[test_topic], serverAssignor='uniform',
topicPartitions=[]) (org.apache.kafka.clients.NetworkClient)
7482 [2024-04-15 16:03:35,964] TRACE For telemetry state SUBSCRIPTION_NEEDED,
returning the value 299843 ms;
(org.apache.kafka.common.telemetry.internals.ClientTelemetryReporter)
7483 [2024-04-15 16:03:35,964] TRACE [Consumer
clientId=consumer-test_group_id-1, groupId=test_group_id] Starting processing
of 1 event (org.apache.kafka.clients.consumer.internals.events.EventProcessor)
7484 [2024-04-15 16:03:35,964] TRACE [Consumer
clientId=consumer-test_group_id-1, groupId=test_group_id] Processing event:
ValidatePositionsEvent{type=VALIDATE_POSITIONS, id=nMQf3YgtRU6vaJu-oVpiQg,
future=java.util.concurrent.CompletableFuture@5 7bc27f5[Not completed, 1
dependents], deadlineMs=9223372036854775807}
(org.apache.kafka.clients.consumer.internals.events.EventProcessor)
7485 [2024-04-15 16:03:35,964] TRACE [Consumer
clientId=consumer-test_group_id-1, groupId=test_group_id] Completed processing
(org.apache.kafka.clients.consumer.internals.events.EventProcessor)
7486 [2024-04-15 16:03:35,964] TRACE [Consumer
clientId=consumer-test_group_id-1, groupId=test_group_id] 999 ms remain before
another request should be sent for
RequestState{owner='org.apache.kafka.clients.consumer.internals.HeartbeatRequestMana
ger$HeartbeatRequestState',
exponentialBackoff=ExponentialBackoff{multiplier=2, expMax=3.3219280948873626,
initialInterval=100, jitter=300000.0}, lastSentMs=1713197015963,
lastReceivedMs=1713197015963, numAttempts=1, backoffMs=1000} (org.ap
ache.kafka.clients.consumer.internals.RequestState)
7487 [2024-04-15 16:03:35,964] TRACE [Consumer
clientId=consumer-test_group_id-1, groupId=test_group_id] Polling for fetches
with timeout 0 (org.apache.kafka.clients.consumer.internals.AsyncKafkaConsumer)
7488 [2024-04-15 16:03:35,964] TRACE [Consumer
clientId=consumer-test_group_id-1, groupId=test_group_id] Found least loaded
node ducker10:9092 (id: 2 rack: null) connected with no in-flight requests
(org.apache.kafka.clients.NetworkClient)
7489 [2024-04-15 16:03:35,964] TRACE [Consumer
clientId=consumer-test_group_id-1, groupId=test_group_id] Enqueued event:
PollEvent{type=POLL, id=T9IwUDeHQoGrnqJJIyzb8Q, pollTimeMs=1713197015964}
(org.apache.kafka.clients.consumer.internals.event
s.ApplicationEventHandler)
7490 [2024-04-15 16:03:35,964] TRACE [Consumer
clientId=consumer-test_group_id-1, groupId=test_group_id] No events to process
(org.apache.kafka.clients.consumer.internals.events.EventProcessor)
7491 [2024-04-15 16:03:35,964] TRACE [Consumer
clientId=consumer-test_group_id-1, groupId=test_group_id] Enqueued event:
ValidatePositionsEvent{type=VALIDATE_POSITIONS, id=otKgqakkS9COd01SkJ3btQ,
future=java.util.concurrent.CompletableFuture@5fb 759d6[Not completed],
deadlineMs=9223372036854775807}
(org.apache.kafka.clients.consumer.internals.events.ApplicationEventHandler)
7492 [2024-04-15 16:03:35,964] DEBUG [Consumer
clientId=consumer-test_group_id-1, groupId=test_group_id] Sending
CONSUMER_GROUP_HEARTBEAT request with header
RequestHeader(apiKey=CONSUMER_GROUP_HEARTBEAT, apiVersion=0,
clientId=consumer-test_gro up_id-1, correlationId=109, headerVersion=2) and
timeout 30000 to node 2147483646:
ConsumerGroupHeartbeatRequestData(groupId='test_group_id', memberId='',
memberEpoch=0, instanceId=null, rackId=null, rebalanceTimeoutMs=300000,
subscribedTop icNames=[test_topic], serverAssignor='uniform',
topicPartitions=[]) (org.apache.kafka.clients.NetworkClient) {code}
> AsyncKafkaConsumer might send out heartbeat request without waiting for its
> response
> ------------------------------------------------------------------------------------
>
> Key: KAFKA-16474
> URL: https://issues.apache.org/jira/browse/KAFKA-16474
> Project: Kafka
> Issue Type: Bug
> Components: clients, consumer
> Reporter: Philip Nee
> Assignee: Lianet Magrans
> Priority: Critical
> Labels: kip-848-client-support
> Fix For: 3.8.0
>
> Attachments: failing_results.zip
>
>
> KAFKA-16389
> We've discovered that in some uncommon cases, the consumer could send out
> successive heartbeats without waiting for the response to come back. this
> might result in causing the consumer to revoke its just assigned assignments
> in some cases. For example:
>
> The consumer first sends out a heartbeat with epoch=0 and memberId=''
> The consumer then rapidly sends out another heartbeat with epoch=0 and
> memberId='' because it has not gotten any response and thus not updating its
> local state
>
> The consumer receives assignments from the first heartbeat and reconciles its
> assignment.
>
> Since the second heartbeat has epoch=0 and memberId='', the server will think
> this is a new member joining and therefore send out an empty assignment.
>
> The consumer receives the response from the second heartbeat. Revoke all of
> its partitions.
>
> There are 2 issues associate with this bug:
> # inflight logic
> # rapid poll: In the KAFKA-16389 we've observe consumer polling interval to
> be a few ms.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)