Re: [PR] KAFKA-20036 Handle LogCleaner segment overflow caused by compression level changes [kafka]

via GitHub Mon, 02 Mar 2026 16:29:30 -0800


junrao commented on code in PR #21379:
URL: https://github.com/apache/kafka/pull/21379#discussion_r2875399185



##########
storage/src/main/java/org/apache/kafka/storage/internals/log/Cleaner.java:
##########
@@ -302,19 +327,26 @@ public void cleanSegments(UnifiedLog log,
      * @param upperBoundOffsetOfCleaningRound Next offset of the last batch in 
the source segment
      * @param stats Collector for cleaning statistics
      * @param currentTime The time at which the clean was initiated
+     * @param log The log instance for creating new segments if overflow occurs
+     *
+     * @return The current active destination segment (maybe different from 
input dest if overflow occurred)

Review Comment:
   This api seems awkward. An alternative is to instead passing in a starting 
position in sourceRecords to cleanInto(). Initially, we can pass in 0 as the 
position, if cleanInto() hits a size limit, it throws an exception with the 
current position in sourceRecords. The caller catches this exception, creates a 
new destination segment and call cleanInto() again with position in the 
exception and the new destination segment. 



##########
storage/src/main/java/org/apache/kafka/storage/internals/log/Cleaner.java:
##########
@@ -400,6 +432,36 @@ public boolean shouldRetainRecord(RecordBatch batch, 
Record record) {
             if (outputBuffer.position() > 0) {
                 outputBuffer.flip();
                 MemoryRecords retained = 
MemoryRecords.readableRecords(outputBuffer);
+
+                // Check for TWO types of overflow BEFORE appending:
+                // 1. Offset overflow: offset range exceeds Integer.MAX_VALUE

Review Comment:
   The grouping of the segments takes offset overflow into consideration. So, 
it seems that we can't hit this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] KAFKA-20036 Handle LogCleaner segment overflow caused by compression level changes [kafka]

Reply via email to