Dandandan opened a new pull request, #21971:
URL: https://github.com/apache/datafusion/pull/21971

   ## Which issue does this PR close?
   
   N/A.
   
   ## Rationale for this change
   
   TPC-H Q2, Q17, and Q20 contain correlated scalar aggregate subqueries that 
are decorrelated into joins. In these cases the right side is unique on the 
correlated join keys, and no right-side columns are projected above the join. 
That means the inner join is only testing existence of a matching scalar 
aggregate row, so a left semi join expresses the plan more directly and avoids 
carrying inner-join output/cardinality machinery.
   
   PR #21240 adds physical execution for uncorrelated scalar subqueries, but 
these cases are correlated and still need decorrelation into a set-at-a-time 
join plan.
   
   ## What changes are included in this PR?
   
   - Extend `EliminateJoin` to rewrite projected inner joins to left semi joins 
when the right input is provably unique on the right join keys.
   - Prove uniqueness through `SubqueryAlias` / `Projection` wrappers over 
`Aggregate` plans whose group expressions are covered by the join keys.
   - Keep the existing inner join when the projection references right columns 
or right-side uniqueness cannot be proven.
   - Update TPC-H plan snapshots for Q2, Q17, and Q20.
   
   ## Are these changes tested?
   
   Yes.
   
   - `cargo test -p datafusion-optimizer eliminate_join --lib`
   - `INCLUDE_TPCH=true cargo test -p datafusion-sqllogictest --test 
sqllogictests -- tpch`
   - `cargo fmt --all`
   - `ci/scripts/rust_clippy.sh`
   - `./dev/rust_lint.sh`
   
   ## Are there any user-facing changes?
   
   No API changes. Eligible logical and physical plans can now show `LeftSemi` 
instead of `Inner` for projected joins against unique scalar aggregate subquery 
results.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to