Re: [PR] [CALCITE-7442] Getting Wrong index of Correlated variable inside Subquery after FilterJoinRule [calcite]

via GitHub Tue, 07 Apr 2026 00:17:20 -0700


yashlimbad commented on code in PR #4840:
URL: https://github.com/apache/calcite/pull/4840#discussion_r3043422295



##########
core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java:
##########
@@ -198,14 +201,35 @@ protected void perform(RelOptRuleCall call, @Nullable 
Filter filter,
       return;
     }
 
+    Set<CorrelationId> leftVariablesSet =  new LinkedHashSet<>();
+    Set<CorrelationId> rightVariablesSet = new LinkedHashSet<>();
+
+    for (RexNode condition : leftFilters) {
+      condition.accept(new RexVisitorImpl<Void>(true) {
+        @Override public Void visitSubQuery(RexSubQuery subQuery) {
+          leftVariablesSet.addAll(RelOptUtil.getVariablesUsed(subQuery.rel));
+          return super.visitSubQuery(subQuery);
+        }
+      });
+    }
+
+    for (RexNode condition : rightFilters) {
+      condition.accept(new RexVisitorImpl<Void>(true) {
+        @Override public Void visitSubQuery(RexSubQuery subQuery) {
+          rightVariablesSet.addAll(RelOptUtil.getVariablesUsed(subQuery.rel));
+          return super.visitSubQuery(subQuery);
+        }
+      });
+    }
+
     // create Filters on top of the children if any filters were
     // pushed to them
     final RexBuilder rexBuilder = join.getCluster().getRexBuilder();
     final RelBuilder relBuilder = call.builder();
     final RelNode leftRel =
-        relBuilder.push(join.getLeft()).filter(leftFilters).build();
+        relBuilder.push(join.getLeft()).filter(leftVariablesSet, 
leftFilters).build();
     final RelNode rightRel =
-        relBuilder.push(join.getRight()).filter(rightFilters).build();
+        relBuilder.push(join.getRight()).filter(rightVariablesSet, 
rightFilters).build();

Review Comment:
   Thanks @silundong, valid concern. I explored the new-CorrelationId approach 
— it would need `cluster.createCorrel()` for each conflicting id, a 
`RexShuttle` overriding `visitFieldAccess` to swap `RexCorrelVariable($corOld)` 
→ `$corNew` via `rexBuilder.makeCorrel()`, and a `RelHomogeneousShuttle` to 
propagate it through `RexSubQuery` inner `RelNode` trees. That's a fair amount 
of rewriting machinery, and any missed node in a nested context would silently 
break the plan. The triggering scenario (same CorrelationId in two 
join-condition subqueries, only one pushed down) is also quite narrow.
   
   So I went with a simpler fix instead:
   
   - A `collectCorrelationIds` helper that walks both direct 
`RexCorrelVariable` references and `RexSubQuery` inner plans across all 
predicate buckets (left, right, remaining join, above).
   - The join's variablesSet is recomputed from what the surviving join 
condition actually references — so if `$cor0` is still needed by a remaining 
subquery, it stays on the join.
   - The above-filter's variablesSet is also properly computed now.
   
   In the rare conflict case, both join and pushed-down filter would share the 
same `$cor0`, which is semantically correct. If this causes decorrelation 
issues in practice, we can add the id-generation layer as a follow-up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [CALCITE-7442] Getting Wrong index of Correlated variable inside Subquery after FilterJoinRule [calcite]

Reply via email to