[
https://issues.apache.org/jira/browse/KAFKA-19529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lucas Brutschy updated KAFKA-19529:
-----------------------------------
Fix Version/s: 3.9.2
4.0.1
4.1.0
> State updater does not always seem to remove all metrics
> --------------------------------------------------------
>
> Key: KAFKA-19529
> URL: https://issues.apache.org/jira/browse/KAFKA-19529
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 3.9.0
> Reporter: Lucas Brutschy
> Assignee: Lucas Brutschy
> Priority: Critical
> Fix For: 3.9.2, 4.0.1, 4.1.0
>
>
> In long-running tests, streams sometimes fails with this exception:
> {code:java}
> /mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz:[2025-06-27
> 02:06:22,384] ERROR [i-0285945bbb1aa2330-StreamThread-3] stream-client
> [i-0285945bbb1aa2330] Encountered the following exception during processing
> and sent shutdown request for the entire application.
> (org.apache.kafka.streams.KafkaStreams)
> /mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz-java.lang.IllegalArgumentException:
> A metric named 'MetricName [name=restore-records-rate,
> group=stream-state-updater-metrics, description=The average per-second number
> of records restored, tags={thread-id=i-0285945bbb1aa2330-StateUpdater-3}]'
> already exists, can't register another one.
> /mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz- at
> org.apache.kafka.common.metrics.Sensor.add(Sensor.java:344)
> /mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz- at
> org.apache.kafka.common.metrics.Sensor.add(Sensor.java:317)
> /mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz- at
> org.apache.kafka.streams.processor.internals.DefaultStateUpdater$StateUpdaterMetrics.<init>(DefaultStateUpdater.java:1121)
> /mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz- at
> org.apache.kafka.streams.processor.internals.DefaultStateUpdater$StateUpdaterThread.<init>(DefaultStateUpdater.java:96)
> /mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz- at
> org.apache.kafka.streams.processor.internals.DefaultStateUpdater.start(DefaultStateUpdater.java:794)
> /mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz- at
> org.apache.kafka.streams.processor.internals.TaskManager.init(TaskManager.java:154)
> /mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz- at
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:885)
> {code}
>
> * This seems to happen, when we restart a stream thread, e.g. the stream
> thread is removed at some point and then restarted at a later point.
> * We see log lines before / after the code that removes metrics from a state
> updater thread
> * Every time, it is a different state updater metric that seems to crash, so
> it seems to be some kind of race condition
> * I found a recent change in state updater initialization:
> [https://github.com/apache/kafka/pull/19889] . I have digged into it, and
> also asked Matthias/Bruno to dig into it, but we couldn’t find a reason why
> this could cause any problem.
> * What Matthias found out is that the problem seems to be on the
> removal-side. I.e., the metric indeed doesn't seem to be removed correctly
> when stopping the state updater.
> We have reverted PR #19889 in 4.1 and this seems to have resolved the problem
> there. The problem still persists on trunk. We should either revert #19889 on
> trunk as well, or fix the PR.
>
> cc [~cadonna] [~bbejeck]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)