[
https://issues.apache.org/jira/browse/KAFKA-9199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17951044#comment-17951044
]
José Armando García Sancio commented on KAFKA-9199:
---------------------------------------------------
When fixing the issue please revert this commit to trunk:
[13fa453|https://github.com/apache/kafka/commit/13fa4537f53f2524ccf1fd7e79d4d4184e093cc1]
> Improve handling of out of sequence errors lower than last acked sequence
> -------------------------------------------------------------------------
>
> Key: KAFKA-9199
> URL: https://issues.apache.org/jira/browse/KAFKA-9199
> Project: Kafka
> Issue Type: Bug
> Components: clients, producer
> Reporter: Jason Gustafson
> Priority: Major
>
> The broker attempts to cache the state of the last 5 batches in order to
> enable duplicate detection. This caching is not guaranteed across restarts:
> we only write the state of the last batch to the snapshot file. It is
> possible in some cases for this to result in a sequence such as the following:
> # Send sequence=n
> # Sequence=n successfully written, but response is not received
> # Leader changes after broker restart
> # Send sequence=n+1
> # Receive successful response for n+1
> # Sequence=n times out and is retried, results in out of order sequence
> There are a couple problems here. First, it would probably be better for the
> broker to return DUPLICATE_SEQUENCE_NUMBER when a sequence number is received
> which is lower than any of the cached batches. Second, the producer handles
> this situation by just retrying until expiration of the delivery timeout.
> Instead it should just fail the batch.
> This issue popped up in the reassignment system test. It ultimately caused
> the test to fail because the producer was stuck retrying the duplicate batch
> repeatedly until ultimately giving up.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)