[
https://issues.apache.org/jira/browse/KAFKA-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Gustafson updated KAFKA-9600:
-----------------------------------
Description:
The EndTxn path in TransactionCoordinator is shared between direct calls to
EndTxn from the client and internal transaction abort logic. To support the
latter, the code is written to allow an epoch bump. However, if the client
bumps the epoch unexpectedly (e.g. due to a buggy implementation), then the
internal invariants are violated which results in a hanging transaction.
Specifically, the transaction is left in a pending state because the epoch
following append to the log does not match what we expect.
To fix this, we should ensure that an EndTxn from the client checks for strict
epoch equality.
was:The EndTxn path in TransactionCoordinator is shared between direct calls
to EndTxn from the client and internal transaction abort logic. To support the
latter, the code is written to allow an epoch bump. However, if the client
bumps the epoch unexpectedly (e.g. due to a buggy implementation), then we can
be left with a hanging transaction. To fix this, we should ensure that an
EndTxn from the client checks for strict epoch equality.
> EndTxn handler should check strict epoch equality
> -------------------------------------------------
>
> Key: KAFKA-9600
> URL: https://issues.apache.org/jira/browse/KAFKA-9600
> Project: Kafka
> Issue Type: Bug
> Reporter: Jason Gustafson
> Priority: Major
>
> The EndTxn path in TransactionCoordinator is shared between direct calls to
> EndTxn from the client and internal transaction abort logic. To support the
> latter, the code is written to allow an epoch bump. However, if the client
> bumps the epoch unexpectedly (e.g. due to a buggy implementation), then the
> internal invariants are violated which results in a hanging transaction.
> Specifically, the transaction is left in a pending state because the epoch
> following append to the log does not match what we expect.
> To fix this, we should ensure that an EndTxn from the client checks for
> strict epoch equality.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)