Francois Visconte created KAFKA-17678:
-----------------------------------------
Summary: Problematic new HWM increment behaviour introduced by
KIP-207 and KIP-966
Key: KAFKA-17678
URL: https://issues.apache.org/jira/browse/KAFKA-17678
Project: Kafka
Issue Type: Bug
Components: replication
Reporter: Francois Visconte
We identified a bug/new behaviour that would lead to consumer lagging for a
long time and ListOffsets requests failing during that time frame.
While the ListOffsets requests failure is expected and has been introduced by
KIP-207, the problematic behavior is more about the inability to increment
highWatermark and the consequence of having lagging consumers.
Here is the situation
* We have a topic with min.isr=2
* We have a partition on broker 16, 17 and 18
* Leader for this partition is broker 17
# Broker 18 failed. Partition has 2 ISRs
# Broker 16 failed. Partition has 1 ISR (17)
# Broker 7 has LEO higher than HWM:
{{[Broker id=17] Leader topic-86 with topic id Some(yFhPOnPsRDiYHgfF2bR2aQ)
starts at leader epoch 7 from offset 3067193660 with partition epoch 11, high
watermark 3067191497, ISR [10017], adding replicas [] and removing replicas []
(under-min-isr). Previous leader Some(10017) and previous leader epoch was 6.}}
At this point producers cannot produce to topic-86 partition because there is
only one ISR, which is expected behavior.
But it seems that KIP-207 prevent answering to ListOffsets requests here
{code:java}
// Only consider throwing an error if we get a client request (isolationLevel
is defined) and the high watermark
// is lagging behind the start offset
val maybeOffsetsError: Option[ApiException] = leaderEpochStartOffsetOpt
.filter(epochStart => isolationLevel.isDefined && epochStart >
localLog.highWatermark)
.map(epochStart => Errors.OFFSET_NOT_AVAILABLE.exception(s"Failed to fetch
offsets for " +
s"partition $topicPartition with leader $epochLogString as this partition's " +
s"high watermark (${localLog.highWatermark}) is lagging behind the " +
s"start offset from the beginning of this epoch ($epochStart).")){code}
It seems that the path to get to the HWM being stuck for so long was introduced
in preparation of KIP-966, see this ticket and PR.
As a result:
* The stuck HWM in the above scenario can also mean that a small part of
messages isn't readable by consumers even though it was in the past.
* In case of truncation, the HWM might still go backwards. This is still
possible even with min.ISR, although it should be rare.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)