[ 
https://issues.apache.org/jira/browse/KAFKA-17380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877195#comment-17877195
 ] 

Matthias J. Sax commented on KAFKA-17380:
-----------------------------------------

Thanks for filing a ticket [~rohitbobade]. From your description, it's very 
hard to tell if you hit some bug or not. Having said this, 2.6 is a quite old 
version, so I would recommend to upgrade... A lot of improvements and bug-fix 
(including for EOS) went into Kafka Streams in the mean time. And 2.6.2 is not 
receiving any bug fixes anyway any longer.

> Kafka Streams few partition stuck in processing - fixed after restart
> ---------------------------------------------------------------------
>
>                 Key: KAFKA-17380
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17380
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.6.2
>            Reporter: Rohit Bobade
>            Priority: Major
>
> Using Kafka Streams 2.6.2 and running stateful aggregations with Exactly once 
> semantics.
> The processing logic is: 
> consume input records -> intermediate aggregate and buffer data in state 
> store backed by change log topic -> punctuate every 15seconds - flush state 
> store and send aggregated records downstream -> final aggregate operation and 
> send to output topic
> Since we use spot instances, one of the pod got restarted and rebalance was 
> triggered and state was getting restored from changelog topic.
> we noticed ProducerFenced exceptions:
> {quote}org.apache.kafka.common.errors.ProducerFencedException: Producer 
> attempted an
> operation with an old epoch. Either there is a newer producer with the same 
> transactionalId, or the producer's transaction has been expired by the broker.
> {quote}
> After this a few partitions were stuck and no records were processed util we 
> restarted the application.
> We had configured:
>  
> transaction.timeout.ms to 30 seconds
> session.timeout.ms to 30 seconds
> could you please advise if there's any known fix for this edge case? 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to