yashlimbad commented on PR #4840:
URL: https://github.com/apache/calcite/pull/4840#issuecomment-4160587931

   > > > > decorrelating join's condition is harder and can complicate 
resulting tree sometime. on the other hand decorrelating filter is logically 
better
   > > > 
   > > > 
   > > > Making `FilterJoinRule` handle correlated subqueries in JOIN ON 
condition would make the logic significantly more complex. In addition, the 
current subquery removal rule already provide a optimization, and their effect 
is essentially the same as pushing down correlated subqueries in ON condition. 
Considering the issues mentioned above, I would prefer to prohibit pushing down 
correlated subqueries. That said, if the issues above can all be addressed 
properly, that would also be good. But I suspect this would require more rounds 
of careful review.
   > > 
   > > 
   > > I agree that subquery remove rule will do the same, but what if someone 
invokes `FilterJoinRule` but not `SubqueryRemoveRule`? the index would remain 
broken at that point. `SubqueryRemoveRule` handles it, it doesn't mean we 
shouldn't fix pushed filter's broken index. And still `SubqueryRemoveRule` will 
not get correct variable set because we failed to extract variable set from 
subquery. Give me sometime, I will brainstorm on your outer scope review and 
comeback.
   > 
   > Most of Calcite is based on the assumption that subquery removal is always 
a good first step, and further rules don't have to deal with subqueries.
   > 
   > The assumption looks reasonable to me, and I don't see a valid reason to 
do otherwise here.
   > 
   > More code and more complex support for rules makes them harder to 
maintain, evolve and understand, without a compelling reason we should strive 
to keep things simple IMO.
   
   @asolimando 
   Thanks, that makes sense from a maintenance/simplicity perspective.
   
   I think the concern here is a bit narrower than "every rule should support 
subqueries". Once FilterJoinRule chooses to rewrite a filter that contains a 
correlated subquery, it is already operating on that structure. At that point, 
I think the rule should do one of two things:
   1. preserve a valid expression after the rewrite, or
   2. explicitly decline that transformation
   
   My concern with the current behavior is that it rewrites the predicate but 
can leave correlated field references inconsistent, which means correctness 
depends on a later SubQueryRemoveRule pass to repair the intermediate state. In 
other words, the issue is not only whether SubQueryRemoveRule can eventually 
handle it, but whether FilterJoinRule should emit an invalid expression for a 
shape it has already transformed. The PR description itself is fixing exactly 
that kind of planning/decorrelation correctness issue around correlated 
variable field indices after filter pushdown.
   I agree that adding broad subquery support to FilterJoinRule would be too 
much. So if the preference is to keep subquery handling centralized, I think a 
conservative alternative would be to skip pushdown when the predicate contains 
correlated subqueries, rather than allowing a rewrite that can leave broken 
correlated references.
   
   So from my perspective the acceptable outcomes are:
   * fix the correlated-reference shifting/variable propagation during 
pushdown, or
   * prohibit this specific pushdown for correlated subqueries.
   But relying on a later rule to repair an intermediate invalid state seems 
brittle.
   
   If the preference is to keep subquery handling centralized, I'm happy to 
revise this toward the conservative bail-out approach and avoid pushing 
conjuncts that contain correlated references.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to