[
https://issues.apache.org/jira/browse/KAFKA-19629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthias J. Sax updated KAFKA-19629:
------------------------------------
Affects Version/s: 3.8.0
(was: 3.8.1)
(was: 3.9.1)
> Deadlock in Kafka Streams when processing Interactive Queries and state store
> updates concurrently
> --------------------------------------------------------------------------------------------------
>
> Key: KAFKA-19629
> URL: https://issues.apache.org/jira/browse/KAFKA-19629
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 3.8.0
> Environment: Kafka Streams, kotlin, linux, docker. JDK 21
> Reporter: Evgheni Popusoi
> Priority: Major
> Attachments: thread-dump-1.txt, thread-dump-2.txt
>
>
> We are using a Kafka Streams topology that continuously writes large volumes
> of data into a RocksDB state store with stable throughput. In parallel,
> another thread executes Interactive Query (IQ) requests against the same
> local state store.
> When the number of IQ requests in the queue grows (≈50+), the application
> enters a {*}deadlock state{*}.
> *Investigation:*
> Using a thread dump, we discovered a lock inversion between RocksDB
> operations:
> * {{RocksDBStore.put}}
> ** blocked on {{org.apache.kafka.streams.query.Position@4ba00b6c}}
> ** holding {{org.apache.kafka.streams.state.internals.RocksDBStore@414cff0e}}
> * {{RocksDBStore.range}}
> ** blocked on
> {{org.apache.kafka.streams.state.internals.RocksDBStore@414cff0e}}
> ** holding {{org.apache.kafka.streams.query.Position@4ba00b6c}}
> This indicates that {*}{{put}} and {{range}} acquire the same locks but in
> different order{*}, which leads to deadlock under concurrent load.
> *Expected Behavior:*
> Kafka Streams API should guarantee deadlock-free operation. Store writes
> ({{{}put{}}}) and IQ reads ({{{}range{}}}) should not block each other in a
> way that leads to lock inversion.
> *Steps to Reproduce:*
> # Create a Kafka Streams topology with a RocksDB state store receiving
> continuous writes.
> # In a parallel thread, issue a high number of Interactive Query {{range}}
> requests (≈50+ queued).
> # Observe that the system eventually enters deadlock.
> *
> *Impact:*
> * Application stops processing data.
> * Interactive Queries fail indefinitely.
> * Requires manual restart to recover.
> *Notes:*
> * Appears to be a lock ordering bug in {{{}RocksDBStore{}}}.
> * Expected the Streams API to coordinate thread-safety and prevent such
> deadlocks.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)