Guillaume Mallet created KAFKA-17212:
----------------------------------------
Summary: Segments containing a single message can be incorrectly
marked as local only
Key: KAFKA-17212
URL: https://issues.apache.org/jira/browse/KAFKA-17212
Project: Kafka
Issue Type: Bug
Components: Tiered-Storage
Affects Versions: 3.7.1, 3.8.0, 3.9.0
Reporter: Guillaume Mallet
There is an edge case triggered when a segment containing a single message
causes the segment to be considered as local only which skews the deletion
process towards deleting more data.
*This is very unlikely to happen in a real scenario but can happen in tests
when segment are rolled manually.*
*It could possibly happen when segment are rolled based on time but even then
the skew would be minimal.*
h2. What happens
In order to delete the right amount of data against the byte retention policy,
we first count all the bytes in
[buildRetentionSizeData|https://github.com/apache/kafka/blob/09be14bb09dc336f941a7859232094bfb3cb3b96/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1335]
function that are breaching {{{}retention.bytes{}}}. In order to do this, the
size of each segment is added to the size of the segments present only on the
disk {{{}onlyLocalLogSegmentsSize{}}}.
Listing the segment only present on disk is made through the function
[onlyLocalLogSegmentSize|https://github.com/apache/kafka/blob/a0f6e6f816c6ac3fbbc4e0dc503dc43bfacfe6c7/core/src/main/scala/kafka/log/UnifiedLog.scala#L1618-L1619]
by adding the size of each segments that have a _baseOffset_ greater or equal
compared to {{{}highestOffsetInRemoteStorage{}}}{_}.{_}
{{highestOffsetInRemoteStorage}} is the highest offset that has been
successfully sent to the remote store{_}.{_}
The _baseOffset_ of a segment is “a [lower bound ({*}inclusive{*}) of the
offset in the
segment”|https://github.com/apache/kafka/blob/a0f6e6f816c6ac3fbbc4e0dc503dc43bfacfe6c7/storage/src/main/java/org/apache/kafka/storage/internals/log/LogSegment.java#L115].
In the case of a segment with a single message, the baseOffset can be equal to
_highestOffsetInRemoteStorage,_ which means that despite the offset being
offloaded to the RemoteStorage, we would count that segment as local only.
This has consequence when counting the bytes to delete as we will count the
size of this segment twice in the
[buildRetentionSizeData|https://github.com/apache/kafka/blob/09be14bb09dc336f941a7859232094bfb3cb3b96/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1155],
once as a segment offloaded in the RemoteStorage and once as a local segment
when
[onlyLocalSegmentSize|https://github.com/apache/kafka/blob/a0f6e6f816c6ac3fbbc4e0dc503dc43bfacfe6c7/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1361-L1363]
is added.
The result is that {{remainingBreachedSize}} will be higher than expected which
can lead to more byte deleted than what we would initially expect, up to the
size of the segment which is double counted.
The issue is due to the fact we are using a greater or equal rather than equal.
A segment present only locally will have a {{baseOffset}} strictly greater than
{{highestOffsetInRemoteStorage.}}
h2. Reproducing the issue
The problem is highlighted in the 2 tests added in this [commit
|https://github.com/apache/kafka/commit/97af351db517d69a2b37c92861e463a6d0c5cb8f]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)