Trevan Richins created KAFKA-17229:
--------------------------------------
Summary: Multiple punctuators that together exceed the transaction
timeout cause ProducerFencedException
Key: KAFKA-17229
URL: https://issues.apache.org/jira/browse/KAFKA-17229
Project: Kafka
Issue Type: Bug
Components: streams
Affects Versions: 3.8.0
Reporter: Trevan Richins
Attachments: always-forward-failure.log, topic-input-failure.log
If a single StreamThread has multiple punctuators tasks and the sum total of
them exceeds the transaction timeout setting, ProducerFencedExceptions will
occur.
For example, in my test case, I have a input topic with 10 partitions, a
processor with a punctuator that just sleeps for 5 seconds (the transaction
timeout is 10s so it finishes within the timeout), and an output topic. The
punctuators run every 30 seconds (wall clock). Once the app is running and is
inside one of the punctuators, I put one record in the input topic. The
punctuators will all finish and the record will be seen and read but it won't
commit because the punctuators run again (since it has been 30s since they last
started). After the punctuators finish this second time, it will try to commit
the transaction that it started 50 seconds ago and will trigger the
ProducerFencedException.
Another test case, with the same scenario, is having the punctuators forward
something. This also causes a ProducerFencedException because the first
punctuator starts a transaction but it doesn't commit the transaction till all
of the punctuators are done and that is long after the transaction timeout.
The issue doesn't exist if there is only one partition as the single punctuator
will finish within the transaction timeout. It is only whene there are
multiple punctuators that exceed the transaction timeout in total.
It feels like what is needed is for kafka to check after each punctuator if
there is data that needs to be committed. If there is, it commits then.
I've attached a log of the first test case. It is called
"topic-input-failure.log". It starts after the punctuators run the first time.
It shows the record being received and the transaction starting. Then it runs
the punctuators again and they each sleep for 5 seconds. Once they are done,
it triggers a ProducerFencedException.
I've attached a log for the second test case. It is called
"always-forward-failure.log". It starts when the punctuators run the first
time. It shows the punctuators forwarding a record and sleeping for 5 seconds.
In this case, only 5 punctuators run as a group. An
InvalidProducerEpochException occurs after the 5th punctuator finishes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)