[
https://issues.apache.org/jira/browse/KAFKA-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kirk True updated KAFKA-10228:
------------------------------
Component/s: producer
> producer: NETWORK_EXCEPTION is thrown instead of a request timeout
> ------------------------------------------------------------------
>
> Key: KAFKA-10228
> URL: https://issues.apache.org/jira/browse/KAFKA-10228
> Project: Kafka
> Issue Type: Improvement
> Components: clients, producer
> Affects Versions: 2.3.1
> Reporter: Christian Becker
> Assignee: Kirk True
> Priority: Major
>
> We're currently seeing an issue with the java client (producer), when message
> producing runs into a timeout. Namely a NETWORK_EXCEPTION is thrown instead
> of a timeout exception.
> *Situation and relevant code:*
> Config
> {code:java}
> request.timeout.ms: 200
> retries: 3
> acks: all{code}
> {code:java}
> for (UnpublishedEvent event : unpublishedEvents) {
> ListenableFuture<SendResult<String, String>> future;
> future = kafkaTemplate.send(new ProducerRecord<>(event.getTopic(),
> event.getKafkaKey(), event.getPayload()));
> futures.add(future.completable());
> }
> CompletableFuture.allOf(futures.stream().toArray(CompletableFuture[]::new)).join();{code}
> We're using the KafkaTemplate from SpringBoot here, but it shouldn't matter,
> as it's merely a wrapper. There we put in batches of messages to be sent.
> 200ms later, we can see the following in the logs: (not sure about the order,
> they've arrived in the same ms, so our logging system might not display them
> in the right order)
> {code:java}
> [Producer clientId=producer-1] Received invalid metadata error in produce
> request on partition events-6 due to
> org.apache.kafka.common.errors.NetworkException: The server disconnected
> before a response was received.. Going to request metadata update now
> [Producer clientId=producer-1] Got error produce response with correlation id
> 3094 on topic-partition events-6, retrying (2 attempts left). Error:
> NETWORK_EXCEPTION {code}
> There is also a corresponding error on the broker (within a few ms):
> {code:java}
> Attempting to send response via channel for which there is no open
> connection, connection id XXX (kafka.network.Processor) {code}
> This was somewhat unexpected and sent us for a hunt across the infrastructure
> for possible connection issues, but we've found none.
> Side note: In some cases the retries worked and the messages were
> successfully produced.
> Only after many hours of heavy debugging, we've noticed, that the error might
> be related to the low timeout setting. We've removed that setting now, as it
> was a remnant from the past and no longer valid for our use-case. However in
> order to avoid other people having that issue again and to simplify future
> debugging, some form of timeout exception should be thrown.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)