lucasbru commented on code in PR #20070:
URL: https://github.com/apache/kafka/pull/20070#discussion_r2176631895
##########
streams/src/test/java/org/apache/kafka/streams/tests/SmokeTestDriver.java:
##########
@@ -522,7 +522,11 @@ private static VerificationResult verifyAll(final
Map<String, Set<Integer>> inpu
}
boolean pass;
try (final PrintStream resultStream = new
PrintStream(byteArrayOutputStream)) {
- pass = verifyTAgg(resultStream, inputs, events.get("tagg"),
validationPredicate, printResults);
+ pass = true;
+ if (eosEnabled) {
+ // TAGG is computing "Count-by-count", which may produce keys
that are not in the input data in ALOS, so we skip validation in this case.
+ pass = verifyTAgg(resultStream, inputs, events.get("tagg"),
printResults);
Review Comment:
See commit description. LessEqual was sufficient for sum and count, but it's
not sufficient for count-by-count.
The TAGG topic contains effectively count-by-count results. So for
example, if we have the input without duplication
0 -> 1,2,3 we will get in TAGG 3 -> 1, since 1 key had 3 values.
with duplication:
0 -> 1,1,2,3 we will get in TAGG 4 -> 1, since 1 key had 4 values.
So effectively, the count for key 3 decreased, and key 4 increased, compared
to expectation, so lessEqual won't fix this.
We could consider opening a ticket to validate whatever you can validate on
TAGG, but you'd have to model precisely what happens under duplication at
various stages (before / after the repartitioning). Since we run the same test
with EOSv2, not sure if it's worth it. Pretty sure it's not worth living with a
flaky test.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]