Re: [PR] KAFKA-17058; Extend CoordinatorRuntime to support non-atomic writes [kafka]

via GitHub Tue, 02 Jul 2024 10:14:27 -0700


dajac commented on code in PR #16498:
URL: https://github.com/apache/kafka/pull/16498#discussion_r1662541019



##########
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/runtime/CoordinatorRuntime.java:
##########
@@ -936,62 +941,90 @@ private void append(
                     ));
                 }
 
-                // Compute the estimated size of the records.
-                int estimatedSize = AbstractRecords.estimateSizeInBytes(
-                    currentBatch.builder.magic(),
-                    compression.type(),
-                    recordsToAppend
-                );
+                if (isAtomic) {

Review Comment:
   > To confirm my understanding, we take the original path when the append 
should be atomic, which means we verify whether all records can fit in the 
current batch. If not, we allocate a new batch then append the records. If the 
append does not need to be atomic, we append individual records until we can 
fit no more, then allocate a new batch.
   > 
   > This will reduce the number of writes to the log as we will have more 
filled batches on average. In practice, only the unload partition and cleanup 
group metadata jobs will have non-atomic writes today so we should not expect 
much impact.
   > 
   > Is this correct?
   
   Yes. This is correct. However, I did not do this change to improve the how 
batches are filled but rather to ensure that we would not be stuck in the case 
where we have a large number of records that could not fit in one batch.



##########
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/runtime/CoordinatorRuntime.java:
##########
@@ -936,62 +941,90 @@ private void append(
                     ));
                 }
 
-                // Compute the estimated size of the records.
-                int estimatedSize = AbstractRecords.estimateSizeInBytes(
-                    currentBatch.builder.magic(),
-                    compression.type(),
-                    recordsToAppend
-                );
+                if (isAtomic) {

Review Comment:
   > One other thing I noticed is the ordering. For the atomic case, we create 
the batch first and then replay whereas non-atomic we do the replay then add to 
batch. Not sure if it makes a big difference though since we moved where we 
enqueue the event.
   
   This is a very good point and the code was wrong. I changed it to check if 
the record can fit in the batch before replaying. I added a test for this too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] KAFKA-17058; Extend CoordinatorRuntime to support non-atomic writes [kafka]

Reply via email to