silundong commented on code in PR #4840:
URL: https://github.com/apache/calcite/pull/4840#discussion_r3006171862
##########
core/src/test/java/org/apache/calcite/sql2rel/RelDecorrelatorTest.java:
##########
@@ -1830,4 +1830,160 @@ public static Frameworks.ConfigBuilder config() {
+ " LogicalTableScan(table=[[scott, EMP]])\n";
assertThat(after, hasTree(planAfter));
}
+
+ /** Test case for <a
href="https://issues.apache.org/jira/browse/CALCITE-7442">[CALCITE-7442]
+ * Getting Wrong index of Correlated variable inside Subquery after
FilterJoinRule</a>. */
+ @Test void testCorrelatedVariableIndexForInClause() {
+ final FrameworkConfig frameworkConfig = config().build();
+ final RelBuilder builder = RelBuilder.create(frameworkConfig);
+ final RelOptCluster cluster = builder.getCluster();
+ final Planner planner = Frameworks.getPlanner(frameworkConfig);
+ final String sql = "select e.empno, d.dname, b.ename\n"
+ + "from emp e\n"
+ + "inner join dept d\n"
+ + " on d.deptno = e.deptno\n"
+ + "inner join bonus b\n"
+ + " on e.ename = b.ename\n"
+ + " and b.job in (\n"
+ + " select b2.job\n"
+ + " from bonus b2\n"
+ + " where b2.ename = b.ename)\n"
+ + "where e.sal > 1000 and d.dname = 'SALES'";
+
+ final RelNode originalRel;
+ try {
+ final SqlNode parse = planner.parse(sql);
+ final SqlNode validate = planner.validate(parse);
+ originalRel = planner.rel(validate).rel;
+ } catch (Exception e) {
+ throw TestUtil.rethrow(e);
+ }
+
+ final HepProgram hepProgram = HepProgram.builder()
+ .addRuleCollection(
+ ImmutableList.of(
+ CoreRules.FILTER_INTO_JOIN,
+ CoreRules.FILTER_SUB_QUERY_TO_CORRELATE))
+ .build();
+ final Program program =
+ Programs.of(hepProgram, true,
+ requireNonNull(cluster.getMetadataProvider()));
+ final RelNode before =
+ program.run(cluster.getPlanner(), originalRel, cluster.traitSet(),
+ Collections.emptyList(), Collections.emptyList());
+
+ final String planBefore = "LogicalProject(EMPNO=[$0], DNAME=[$9],
ENAME=[$11])\n"
+ + " LogicalJoin(condition=[=($1, $11)], joinType=[inner],
variablesSet=[[$cor0]])\n"
Review Comment:
The `Join` no longer has correlated subqueries, but `variablesSet=[[$cor0]]`
remains. The same applies to the second test case.
##########
core/src/main/java/org/apache/calcite/plan/RelOptUtil.java:
##########
@@ -4752,6 +4759,17 @@ public ImmutableBitSet build() {
}
return super.visitCall(call);
}
+
+ @Override public Void visitSubQuery(RexSubQuery subQuery) {
Review Comment:
If I understand correctly, the purpose here is to obtain the indices of the
free variables in the subquery. My concern is:
`InputFinder` analyzes the expressions of a `RelNode`, and there should be
at most one `CorrelationId` that belongs to that `RelNode` (at least during the
rule rewriting phase). Here, calling `RelOptUtil.getVariablesUsed` returns a
set of `CorrelationId`, which may include CorrelationIds that do not belong to
this `RelNode` (i.e., the free variables in the subquery come from an outer
scope). While it may be difficult to construct such a case,
`InputFinder.analyze` is a public method, and we don't know how users will call
it in their systems.
If we want InputFinder to consider the indices of free variables in the
subquery, perhaps we should add a field to record the variablesSet of the
`RelNode`.
##########
core/src/main/java/org/apache/calcite/plan/RelOptUtil.java:
##########
@@ -4818,22 +4849,61 @@ public RexInputConverter(
null,
leftDestFields,
rightDestFields,
- adjustments);
+ adjustments,
+ 0,
+ null);
}
public RexInputConverter(
RexBuilder rexBuilder,
@Nullable List<RelDataTypeField> srcFields,
@Nullable List<RelDataTypeField> destFields,
int[] adjustments) {
- this(rexBuilder, srcFields, destFields, null, null, adjustments);
+ this(rexBuilder, srcFields, destFields, null, null, adjustments, 0,
null);
}
public RexInputConverter(
RexBuilder rexBuilder,
@Nullable List<RelDataTypeField> srcFields,
int[] adjustments) {
- this(rexBuilder, srcFields, null, null, null, adjustments);
+ this(rexBuilder, srcFields, null, null, null, adjustments, 0, null);
+ }
+
+ public RexInputConverter(
+ RexBuilder rexBuilder,
+ @Nullable List<RelDataTypeField> srcFields,
+ @Nullable List<RelDataTypeField> destFields,
+ int[] adjustments,
+ int offset,
+ RelNode child) {
+ this(rexBuilder, srcFields, destFields, null, null, adjustments, offset,
child);
+ }
+
+ @Override public RexNode visitSubQuery(RexSubQuery subQuery) {
+ boolean[] update = {false};
+ List<RexNode> clonedOperands = visitList(subQuery.operands, update);
+ if (update[0]) {
+ subQuery = subQuery.clone(subQuery.getType(), clonedOperands);
+ }
+ final Set<CorrelationId> variablesSet =
+ RelOptUtil.getVariablesUsed(subQuery.rel);
Review Comment:
Similar to the concern at `InputFinder.visitSubQuery`, there might be a
`CorrelationId` belonging to an outer scope here.
##########
core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java:
##########
@@ -198,14 +201,35 @@ protected void perform(RelOptRuleCall call, @Nullable
Filter filter,
return;
}
+ Set<CorrelationId> leftVariablesSet = new LinkedHashSet<>();
+ Set<CorrelationId> rightVariablesSet = new LinkedHashSet<>();
+
+ for (RexNode condition : leftFilters) {
+ condition.accept(new RexVisitorImpl<Void>(true) {
+ @Override public Void visitSubQuery(RexSubQuery subQuery) {
+ leftVariablesSet.addAll(RelOptUtil.getVariablesUsed(subQuery.rel));
+ return super.visitSubQuery(subQuery);
+ }
+ });
+ }
+
+ for (RexNode condition : rightFilters) {
+ condition.accept(new RexVisitorImpl<Void>(true) {
+ @Override public Void visitSubQuery(RexSubQuery subQuery) {
+ rightVariablesSet.addAll(RelOptUtil.getVariablesUsed(subQuery.rel));
+ return super.visitSubQuery(subQuery);
+ }
+ });
+ }
+
// create Filters on top of the children if any filters were
// pushed to them
final RexBuilder rexBuilder = join.getCluster().getRexBuilder();
final RelBuilder relBuilder = call.builder();
final RelNode leftRel =
- relBuilder.push(join.getLeft()).filter(leftFilters).build();
+ relBuilder.push(join.getLeft()).filter(leftVariablesSet,
leftFilters).build();
final RelNode rightRel =
- relBuilder.push(join.getRight()).filter(rightFilters).build();
+ relBuilder.push(join.getRight()).filter(rightVariablesSet,
rightFilters).build();
Review Comment:
If the initial ON condition contains two correlated subqueries, one that
cannot be pushed down and the other is pushed down to the right, could the
following situation occur?
```
Join(variablesSet=cor1)
/ \
Filter(variablesSet=cor1)
// after removing subquery and applying other rules
Correlate(cor1)
/ \
......
\
Correlate(cor1)
```
Two `Correlate` with the same id, with one appearing in the RHS of the other
— this may cause serious decorrelation errors.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]