[
https://issues.apache.org/jira/browse/KAFKA-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rohit Bobade updated KAFKA-17380:
---------------------------------
Description:
Using Kafka Streams 2.6.2 and running stateful aggregations with Exactly once
semantics.
The processing logic is:
consume input records -> intermediate aggregate and buffer data in state store
backed by change log topic -> punctuate every 15seconds - flush state store and
send aggregated records downstream -> final aggregate operation and send to
output topic
Since we use spot instances, one of the pod got restarted and rebalance was
triggered and state was getting restored from changelog topic.
we noticed ProducerFenced exceptions:
{quote}org.apache.kafka.common.errors.ProducerFencedException: Producer
attempted an
operation with an old epoch. Either there is a newer producer with the same
transactionalId, or the producer's transaction has been expired by the broker.
{quote}
After this a few partitions were stuck and no records were processed util we
restarted the application.
We had configured:
transaction.timeout.ms to 30 seconds
session.timeout.ms to 30 seconds
could you please advise if there's any known fix for this edge case?
was:
Using Kafka Streams 2.6.2 and running stateful aggregations with Exactly once
semantics.
The processing logic is:
consume input records -> intermediate aggregate and buffer data in state store
backed by change log topic -> punctuate every 15seconds - flush state store and
send aggregated records downstream -> final aggregate operation and send to
output topic
Since we use spot instances, one of the pod got restarted and rebalance was
triggered.
we noticed ProducerFenced exceptions:
{quote}org.apache.kafka.common.errors.ProducerFencedException: Producer
attempted an
operation with an old epoch. Either there is a newer producer with the same
transactionalId, or the producer's transaction has been expired by the broker.
{quote}
After this a few partitions were stuck and no records were processed util we
restarted the application.
We had configured:
transaction.timeout.ms to 30 seconds
session.timeout.ms to 30 seconds
could you please advise if there's any known fix for this edge case?
> Kafka Streams few partition stuck in processing - fixed after restart
> ---------------------------------------------------------------------
>
> Key: KAFKA-17380
> URL: https://issues.apache.org/jira/browse/KAFKA-17380
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 2.6.2
> Reporter: Rohit Bobade
> Priority: Major
>
> Using Kafka Streams 2.6.2 and running stateful aggregations with Exactly once
> semantics.
> The processing logic is:
> consume input records -> intermediate aggregate and buffer data in state
> store backed by change log topic -> punctuate every 15seconds - flush state
> store and send aggregated records downstream -> final aggregate operation and
> send to output topic
> Since we use spot instances, one of the pod got restarted and rebalance was
> triggered and state was getting restored from changelog topic.
> we noticed ProducerFenced exceptions:
> {quote}org.apache.kafka.common.errors.ProducerFencedException: Producer
> attempted an
> operation with an old epoch. Either there is a newer producer with the same
> transactionalId, or the producer's transaction has been expired by the broker.
> {quote}
> After this a few partitions were stuck and no records were processed util we
> restarted the application.
> We had configured:
>
> transaction.timeout.ms to 30 seconds
> session.timeout.ms to 30 seconds
> could you please advise if there's any known fix for this edge case?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)