github-actions[bot] commented on code in PR #64559:
URL: https://github.com/apache/doris/pull/64559#discussion_r3487837899
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/stats/JoinEstimation.java:
##########
@@ -63,6 +74,22 @@ private static EqualPredicate
normalizeEqualPredJoinCondition(EqualPredicate equ
}
}
+ static boolean hasTrustableEqualCondition(Statistics leftStats, Statistics
rightStats, Join join) {
+ if (join.getEqualPredicates().isEmpty()) {
+ return false;
+ }
+ double rightStatsRowCount =
StatsMathUtil.nonZeroDivisor(rightStats.getRowCount());
+ double leftStatsRowCount =
StatsMathUtil.nonZeroDivisor(leftStats.getRowCount());
+ return join.getEqualPredicates().stream()
+ .map(expression ->
normalizeEqualPredJoinCondition((EqualPredicate) expression, rightStats))
+ .anyMatch(equal -> {
+ ColumnStatistic eqLeftColStats =
ExpressionEstimation.estimate(equal.left(), leftStats);
+ ColumnStatistic eqRightColStats =
ExpressionEstimation.estimate(equal.right(), rightStats);
+ return eqRightColStats.ndv / rightStatsRowCount >
TRUSTABLE_UNIQ_THRESHOLD
Review Comment:
`hasTrustableEqualCondition()` should reject unknown column stats before
applying the NDV ratio. `ExpressionEstimation.visitSlotReference()` returns
`ColumnStatistic.UNKNOWN` when a slot has no stats, and unknown stats are built
with `ndv=1` and `isUnKnown=true`. For a DPHyp group such as:
```text
Group{A,B}
LogicalJoin(A.k = B.k)
A stats rowCount=1, A.k=UNKNOWN
B stats rowCount=N, B.k=UNKNOWN
```
this helper evaluates `1 / nonZeroDivisor(1) > 0.9`, so
`MemoStatsAndCostRecomputer.isTrustJoin()` gives the candidate a trust-join
point even though `StatsCalculator` would mark the expression unreliable for
unknown input slots. With
`memo_logical_row_count_aggregation_policy=trust_join_count`,
`filterCandidateStatisticsByPolicy()` can then prefer a candidate because its
unknown equality was counted as trusted. Please check
`!eqLeftColStats.isUnKnown && !eqRightColStats.isUnKnown` (or reuse the
existing unknown-condition guard) before treating the equality as trustable.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]