Re: [PR] KAFKA-19429: Deflake streams_smoke_test, again [kafka]

via GitHub Tue, 01 Jul 2025 00:24:06 -0700


lucasbru commented on code in PR #20070:
URL: https://github.com/apache/kafka/pull/20070#discussion_r2176631895



##########
streams/src/test/java/org/apache/kafka/streams/tests/SmokeTestDriver.java:
##########
@@ -522,7 +522,11 @@ private static VerificationResult verifyAll(final 
Map<String, Set<Integer>> inpu
         }
         boolean pass;
         try (final PrintStream resultStream = new 
PrintStream(byteArrayOutputStream)) {
-            pass = verifyTAgg(resultStream, inputs, events.get("tagg"), 
validationPredicate, printResults);
+            pass = true;
+            if (eosEnabled) {
+                // TAGG is computing "Count-by-count", which may produce keys 
that are not in the input data in ALOS, so we skip validation in this case.
+                pass = verifyTAgg(resultStream, inputs, events.get("tagg"), 
printResults);

Review Comment:
   See commit description. LessEqual was sufficient for sum and count, but it's 
not sufficient for count-by-count.
   
   The TAGG topic contains effectively count-by-count results. So for
   example, if we have the input without duplication
   
   0 -> 1,2,3 we will get in TAGG 3 -> 1, since 1 key had 3 values.
   
   with duplication:
   
   0 -> 1,1,2,3 we will get in TAGG 4 -> 1, since 1 key had 4 values.
   
   So effectively, the count for key 3 decreased, and key 4 increased, compared 
to expectation, so lessEqual won't fix this.
   
   We could consider opening a ticket to validate whatever you can validate on 
TAGG, but you'd have to model precisely what happens under duplication at 
various stages (before / after the repartitioning). Since we run the same test 
with EOSv2, not sure if it's worth it. Pretty sure it's not worth living with a 
flaky test.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] KAFKA-19429: Deflake streams_smoke_test, again [kafka]

Reply via email to