[
https://issues.apache.org/jira/browse/KAFKA-17229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875704#comment-17875704
]
Matthias J. Sax commented on KAFKA-17229:
-----------------------------------------
Great find. There is currently no WIP for the new processing thread, so we
don't have a timeline atm. – Thus, it might make sense to fix on the old code
path (hoping that the fix won't be too complex...).
I am not super familiar (well, actually not at all) with the new processing
thread code, and while it would fix the issue. Maybe [~cadonna] or [~lucasbru]
could shed some light (for my own education).
> Multiple punctuators that together exceed the transaction timeout cause
> ProducerFencedException
> -----------------------------------------------------------------------------------------------
>
> Key: KAFKA-17229
> URL: https://issues.apache.org/jira/browse/KAFKA-17229
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 3.8.0
> Reporter: Trevan Richins
> Priority: Major
> Attachments: always-forward-failure.log, topic-input-failure.log
>
>
> If a single StreamThread has multiple punctuators tasks and the sum total of
> them exceeds the transaction timeout setting, ProducerFencedExceptions will
> occur.
> For example, in my test case, I have a input topic with 10 partitions, a
> processor with a punctuator that just sleeps for 5 seconds (the transaction
> timeout is 10s so it finishes within the timeout), and an output topic. The
> punctuators run every 30 seconds (wall clock). Once the app is running and
> is inside one of the punctuators, I put one record in the input topic. The
> punctuators will all finish and the record will be seen and read but it won't
> commit because the punctuators run again (since it has been 30s since they
> last started). After the punctuators finish this second time, it will try to
> commit the transaction that it started 50 seconds ago and will trigger the
> ProducerFencedException.
> Another test case, with the same scenario, is having the punctuators forward
> something. This also causes a ProducerFencedException because the first
> punctuator starts a transaction but it doesn't commit the transaction till
> all of the punctuators are done and that is long after the transaction
> timeout.
> The issue doesn't exist if there is only one partition as the single
> punctuator will finish within the transaction timeout. It is only when there
> are multiple punctuators that exceed the transaction timeout in total.
> It feels like what is needed is for kafka to check after each punctuator if
> there is data that needs to be committed. If there is, it commits then.
>
> I've attached a log of the first test case. It is called
> "topic-input-failure.log". It starts after the punctuators run the first
> time. It shows the record being received and the transaction starting. Then
> it runs the punctuators again and they each sleep for 5 seconds. Once they
> are done, it triggers a ProducerFencedException.
> I've attached a log for the second test case. It is called
> "always-forward-failure.log". It starts when the punctuators run the first
> time. It shows the punctuators forwarding a record and sleeping for 5
> seconds. In this case, only 5 punctuators run as a group. An
> InvalidProducerEpochException occurs after the 5th punctuator finishes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)