github-actions[bot] commented on code in PR #63763:
URL: https://github.com/apache/doris/pull/63763#discussion_r3449564519


##########
regression-test/data/nereids_rules_p0/subquery_to_window_function/correlated_scalar_subquery_to_window_function.out:
##########
@@ -0,0 +1,13 @@
+-- This file is automatically generated. You should know what you did if you 
want to edit this
+-- !d26072 --
+10     2       1       7
+11     2       1       7
+20     4       2       10
+30     5       3       8
+
+-- !d26072_no_change --
+2      1       7
+2      1       7
+4      2       10
+5      3       8
+

Review Comment:
   `git diff --check` fails on this generated output with `new blank line at 
EOF` because the file has an extra empty line after the last expected row. 
Please regenerate or trim the `.out` so it ends after the last expected row 
with only the normal final newline.



##########
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/AggScalarSubQueryToWindowFunction.java:
##########
@@ -362,9 +445,66 @@ private Plan rewrite(LogicalFilter<? extends Plan> filter, 
LogicalApply<Plan, Pl
         windowFilterConjunct = ExpressionUtils.replace(windowFilterConjunct,
                 ImmutableMap.of(aggOut.toSlot(), aggOutExpr));
 
-        LogicalFilter<Plan> newFilter = 
filter.withConjunctsAndChild(conjuncts.get(true), apply.left());
+        // Split uncorrelated conjuncts: predicates that reference ONLY shared
+        // relation slots (tables appearing in both outer and inner plans) must
+        // stay ABOVE the window. Otherwise the window function would see a
+        // different set of rows than the original scalar subquery.
+        //
+        // For example, with fact(f) as shared table and dim(d) as outer-only:
+        //   f.v > 6        → shared-only → must stay above the window
+        //   d.tag > 0      → outer-only  → safe below the window
+        //   f.k = d.k      → join cond   → needed below the window
+        //
+        // We find shared tables by comparing table IDs that appear in both
+        // outer and inner plans, then collect ALL output slots of those
+        // tables (not just columns referenced in the inner query).
+        List<CatalogRelation> outerRels = outerPlans.stream()
+                .filter(CatalogRelation.class::isInstance)
+                .map(CatalogRelation.class::cast)
+                .collect(Collectors.toList());
+        List<CatalogRelation> innerRels = innerPlans.stream()
+                .filter(CatalogRelation.class::isInstance)
+                .map(CatalogRelation.class::cast)
+                .collect(Collectors.toList());
+        Set<Long> innerTableIds = innerRels.stream()
+                .map(r -> r.getTable().getId())
+                .collect(Collectors.toSet());
+        Set<ExprId> sharedOuterExprIds = outerRels.stream()
+                .filter(r -> innerTableIds.contains(r.getTable().getId()))
+                .flatMap(r -> r.getOutput().stream())
+                .map(Slot::getExprId)
+                .collect(Collectors.toSet());
+        Set<Expression> uncorrelatedConjuncts = conjuncts.get(true);
+        Set<Expression> belowWindowConjuncts = Sets.newHashSet();
+        Set<Expression> aboveWindowConjuncts = Sets.newHashSet();
+        if (uncorrelatedConjuncts != null) {
+            for (Expression conj : uncorrelatedConjuncts) {
+                boolean hasShared = false;
+                boolean hasNonShared = false;
+                for (ExprId id : conj.getInputSlotExprIds()) {
+                    if (sharedOuterExprIds.contains(id)) {
+                        hasShared = true;
+                    } else {
+                        hasNonShared = true;
+                    }
+                }
+                if (hasShared && !hasNonShared) {

Review Comment:
   This split also moves shared-table predicates that were part of the inner 
subquery filter above the window. `checkFilter` proves that every inner 
conjunct exists in the outer filter after slot replacement, but it does not 
remember which outer conjuncts were matched. For a matching shape 
like:\n\n```text\nFilter(f.k = d.k, f.v < 10, f.v * 2 > sum_alias)\n  
Apply(correlation: d.k)\n    CrossJoin\n      Scan fact f\n      Scan dim d  -- 
unique on k\n    Aggregate(sum(f2.v) AS sum_alias)\n      Filter(f2.k = d.k, 
f2.v < 10)\n        Scan fact f2\n```\n\n`f2.v < 10` is a real input filter for 
the scalar subquery aggregate. After replacement it is `f.v < 10`, but this 
code classifies it as shared-only and puts it in `aboveWindowConjuncts`, 
leaving the window to compute `SUM(f.v) OVER (PARTITION BY d.k)` over all fact 
rows. With fact values `5` and `100` for the same key, the original subquery 
sum for the surviving `v=5` row is `5`, while the rewrite uses `105` and can 
filter the row inc
 orrectly.\n\nPlease track the inner conjuncts matched by `checkFilter` and 
force those below the window, or reject cases where shared-table predicates 
cannot be separated into matched inner filters versus extra outer filters.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to