Re: [PR] KAFKA-16310 ListOffsets doesn't report the offset with maxTimestamp a… [kafka]

via GitHub Thu, 28 Mar 2024 12:10:53 -0700


chia7712 commented on code in PR #15618:
URL: https://github.com/apache/kafka/pull/15618#discussion_r1543505409



##########
core/src/main/scala/kafka/log/UnifiedLog.scala:
##########
@@ -1320,10 +1320,8 @@ class UnifiedLog(@volatile var logStartOffset: Long,
         // constant time access while being safe to use with concurrent 
collections unlike `toArray`.
         val segmentsCopy = logSegments.toBuffer
         val latestTimestampSegment = segmentsCopy.maxBy(_.maxTimestampSoFar)
-        val latestTimestampAndOffset = 
latestTimestampSegment.maxTimestampAndOffsetSoFar
-
-        Some(new TimestampAndOffset(latestTimestampAndOffset.timestamp,
-          latestTimestampAndOffset.offset,
+        val batch = 
latestTimestampSegment.log.batches().asScala.maxBy(_.maxTimestamp())

Review Comment:
   > latestTimestampSegment.log.batches() scans the whole log segment and could 
introduce unnecessary extra I/O. So, there could be performance degradation 
because of that.
   
   The `batches` is a `iterable` object, and its implementation load the batch 
only if we call `next`. 
https://github.com/apache/kafka/blob/3.6/clients/src/main/java/org/apache/kafka/common/record/FileLogInputStream.java#L63
   
   Hence, the benefit of looking up for a batch (find the position and then use 
it to call `batchesFrom`) is that we can save some I/O by skipping some 
batches. Please correct me if I misunderstand anything.
   
   > I am not sure I understand this. Looking up for a batch with each 
baseOffset or lastOffset will locate the same batch using the offset index, 
right?
   
   Is the impl of lookup like this?
   ```scala
           val position = 
latestTimestampSegment.offsetIndex.lookup(latestTimestampSegment.offsetOfMaxTimestampSoFar)
           latestTimestampSegment.log.batchesFrom(position.position).asScala
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] KAFKA-16310 ListOffsets doesn't report the offset with maxTimestamp a… [kafka]

Reply via email to