yashlimbad commented on code in PR #4840:
URL: https://github.com/apache/calcite/pull/4840#discussion_r3043422295
##########
core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java:
##########
@@ -198,14 +201,35 @@ protected void perform(RelOptRuleCall call, @Nullable
Filter filter,
return;
}
+ Set<CorrelationId> leftVariablesSet = new LinkedHashSet<>();
+ Set<CorrelationId> rightVariablesSet = new LinkedHashSet<>();
+
+ for (RexNode condition : leftFilters) {
+ condition.accept(new RexVisitorImpl<Void>(true) {
+ @Override public Void visitSubQuery(RexSubQuery subQuery) {
+ leftVariablesSet.addAll(RelOptUtil.getVariablesUsed(subQuery.rel));
+ return super.visitSubQuery(subQuery);
+ }
+ });
+ }
+
+ for (RexNode condition : rightFilters) {
+ condition.accept(new RexVisitorImpl<Void>(true) {
+ @Override public Void visitSubQuery(RexSubQuery subQuery) {
+ rightVariablesSet.addAll(RelOptUtil.getVariablesUsed(subQuery.rel));
+ return super.visitSubQuery(subQuery);
+ }
+ });
+ }
+
// create Filters on top of the children if any filters were
// pushed to them
final RexBuilder rexBuilder = join.getCluster().getRexBuilder();
final RelBuilder relBuilder = call.builder();
final RelNode leftRel =
- relBuilder.push(join.getLeft()).filter(leftFilters).build();
+ relBuilder.push(join.getLeft()).filter(leftVariablesSet,
leftFilters).build();
final RelNode rightRel =
- relBuilder.push(join.getRight()).filter(rightFilters).build();
+ relBuilder.push(join.getRight()).filter(rightVariablesSet,
rightFilters).build();
Review Comment:
Thanks @silundong, valid concern. I explored the new-CorrelationId approach
— it would need `cluster.createCorrel()` for each conflicting id, a
`RexShuttle` overriding `visitFieldAccess` to swap `RexCorrelVariable($corOld)`
→ `$corNew` via `rexBuilder.makeCorrel()`, and a `RelHomogeneousShuttle` to
propagate it through `RexSubQuery` inner `RelNode` trees. That's a fair amount
of rewriting machinery, and any missed node in a nested context would silently
break the plan. The triggering scenario (same CorrelationId in two
join-condition subqueries, only one pushed down) is also quite narrow.
So I went with a simpler fix instead:
- A `collectCorrelationIds` helper that walks both direct
`RexCorrelVariable` references and `RexSubQuery` inner plans across all
predicate buckets (left, right, remaining join, above).
- The join's variablesSet is recomputed from what the surviving join
condition actually references — so if `$cor0` is still needed by a remaining
subquery, it stays on the join.
- The above-filter's variablesSet is also properly computed now.
In the rare conflict case, both join and pushed-down filter would share the
same `$cor0`, which is semantically correct. If this causes decorrelation
issues in practice, we can add the id-generation layer as a follow-up.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]