[
https://issues.apache.org/jira/browse/KAFKA-19341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17954530#comment-17954530
]
Patrik Kleindl commented on KAFKA-19341:
----------------------------------------
Update: This issue was fixed by a minor PR in Trunk
(https://github.com/apache/kafka/pull/18674) and is part of 4.0.0
I added a unit test locally to verify this and noticed that the current tests
don't cover the values used in the code and would fail with them
{code:java}
@Test
public void testRecordLimitWithLongerHighestTrackableValue() {
long highestTrackableValue = Duration.ofMinutes(1).toMillis();
HdrHistogram hdrHistogram = new HdrHistogram(10L, highestTrackableValue, 3);
hdrHistogram.record(highestTrackableValue + 1);
assertEquals(highestTrackableValue,
hdrHistogram.max(System.currentTimeMillis()));
}{code}
This fails until numberOfSignificantValueDigits (last parameter) is increased
from 3 to 5.
The related code where this is set up is
{code:java}
public static KafkaMetricHistogram newLatencyHistogram(
Function<String, MetricName> metricNameFactory
) {
return new KafkaMetricHistogram(
metricNameFactory,
MAX_LATENCY_MS,
NUM_SIG_FIGS);
}{code}
[~jeffkbkim] Linking you here as you did the implementation and the fix for the
exception.
> Execution of HighWatermarkUpdate failed
> ---------------------------------------
>
> Key: KAFKA-19341
> URL: https://issues.apache.org/jira/browse/KAFKA-19341
> Project: Kafka
> Issue Type: Bug
> Components: group-coordinator
> Affects Versions: 4.0.0
> Reporter: Patrik Kleindl
> Priority: Major
>
> We got the following Exception multiple times in our logs when a client
> showed problems with the group coordinator:
> {code:java}
> [ERROR] 2025-05-27 02:18:51,623 [group-coordinator-event-processor-0]
> org.apache.kafka.coordinator.group.runtime.CoordinatorRuntime complete -
> [GroupCoordinator id=2] Execution of HighWatermarkUpdate failed due to value
> 45050145 outside of histogram covered range. Caused by:
> java.lang.ArrayIndexOutOfBoundsException: Index 16734 out of bounds for
> length 7168.
> java.lang.ArrayIndexOutOfBoundsException: value 45050145 outside of histogram
> covered range. Caused by: java.lang.ArrayIndexOutOfBoundsException: Index
> 16734 out of bounds for length 7168
> at
> org.HdrHistogram.AbstractHistogram.handleRecordException(AbstractHistogram.java:571)
> at
> org.HdrHistogram.AbstractHistogram.recordSingleValue(AbstractHistogram.java:563)
> at
> org.HdrHistogram.AbstractHistogram.recordValue(AbstractHistogram.java:467)
> at org.HdrHistogram.Recorder.recordValue(Recorder.java:136)
> at
> org.apache.kafka.coordinator.group.metrics.HdrHistogram.record(HdrHistogram.java:98)
> at
> org.apache.kafka.coordinator.group.metrics.KafkaMetricHistogram.record(KafkaMetricHistogram.java:128)
> at org.apache.kafka.common.metrics.Sensor.recordInternal(Sensor.java:237)
> at org.apache.kafka.common.metrics.Sensor.record(Sensor.java:198)
> at
> org.apache.kafka.coordinator.group.metrics.GroupCoordinatorRuntimeMetrics.recordEventPurgatoryTime(GroupCoordinatorRuntimeMetrics.java:301)
> at
> org.apache.kafka.coordinator.group.runtime.CoordinatorRuntime$CoordinatorWriteEvent.complete(CoordinatorRuntime.java:1362)
> at
> org.apache.kafka.deferred.DeferredEventQueue.completeUpTo(DeferredEventQueue.java:63)
> at
> org.apache.kafka.coordinator.group.runtime.CoordinatorRuntime$HighWatermarkListener.lambda$onHighWatermarkUpdated$0(CoordinatorRuntime.java:1802)
> at
> org.apache.kafka.coordinator.group.runtime.CoordinatorRuntime$CoordinatorInternalEvent.run(CoordinatorRuntime.java:1723)
> at
> org.apache.kafka.coordinator.group.runtime.MultiThreadedEventProcessor$EventProcessorThread.handleEvents(MultiThreadedEventProcessor.java:148)
> at
> org.apache.kafka.coordinator.group.runtime.MultiThreadedEventProcessor$EventProcessorThread.run(MultiThreadedEventProcessor.java:180){code}
> We are running Confluent Platform 7.9 which should be based on Apache Kafka
> 3.9, but this Exception should only be present in Kafka 4.0 from
> https://issues.apache.org/jira/browse/KAFKA-16379
> I will create a ticket with Confluent, but as this code is part of Apache
> Kafka itself it could probably affect others too.
> If I understand the exception the HighWatermarkUpdate operation itself was
> successful but the problem is caused by writing the metrics.
> After a restart of the cluster and the client the problem was resolved, but
> it didn't show up right after the last update or changes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)