BiteTheDDDDt opened a new pull request, #62043:
URL: https://github.com/apache/doris/pull/62043
This pull request fixes a bug in the `window_funnel_v2` aggregate function's
DEDUPLICATION mode, where chains were incorrectly broken when a single row
matched multiple events (multi-match rows). The update ensures that only true
duplicates from different rows break the chain, aligning the behavior with the
previous version (V1). The changes also include comprehensive regression tests
to verify the correct handling of both multi-match rows and true duplicates.
**Bug Fixes in Deduplication Logic:**
* Updated the deduplication logic in `WindowFunnelStateV2`
(`aggregate_function_window_funnel_v2.h`) to skip breaking the chain when a
"duplicate" event is actually from the same row as an event already in the
chain, preventing premature chain termination on multi-match rows.
[[1]](diffhunk://#diff-1a1c09dde1a5d97a9723ffebb33ddb27344131ac2031c986ce6866bf248c5971L426-R433)
[[2]](diffhunk://#diff-1a1c09dde1a5d97a9723ffebb33ddb27344131ac2031c986ce6866bf248c5971L437-R448)
* Added a new helper method `_is_same_row_as_chain` to check if an event is
from the same row as any event in the current chain, used to distinguish true
duplicates from multi-match rows.
**Testing Improvements:**
* Added two new unit tests in `vec_window_funnel_v2_test.cpp`:
* `testDeduplicationSameRowMultiEvent` verifies that multi-match rows do
not break the chain in DEDUPLICATION mode.
* `testDeduplicationTrueDuplicateStillBreaks` ensures that a true
duplicate on a different row still breaks the chain as expected.
**Regression Test Suite Updates:**
* Added regression tests in `window_funnel_v2.groovy` and updated expected
outputs in `window_funnel_v2.out` to cover the fixed scenarios for
DEDUPLICATION mode, ensuring both multi-match and true duplicate behaviors are
validated.
[[1]](diffhunk://#diff-4c2f6bf42109868e75fcf63937b90874666325cd9bfb291c30637f34c9b11575R489-R549)
[[2]](diffhunk://#diff-072918ec4eec0fc9fd2fc66dec06a9ebd56da09d086a690c89a1debedce110f7R89-R94)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]