Dandandan opened a new pull request, #21971: URL: https://github.com/apache/datafusion/pull/21971
## Which issue does this PR close? N/A. ## Rationale for this change TPC-H Q2, Q17, and Q20 contain correlated scalar aggregate subqueries that are decorrelated into joins. In these cases the right side is unique on the correlated join keys, and no right-side columns are projected above the join. That means the inner join is only testing existence of a matching scalar aggregate row, so a left semi join expresses the plan more directly and avoids carrying inner-join output/cardinality machinery. PR #21240 adds physical execution for uncorrelated scalar subqueries, but these cases are correlated and still need decorrelation into a set-at-a-time join plan. ## What changes are included in this PR? - Extend `EliminateJoin` to rewrite projected inner joins to left semi joins when the right input is provably unique on the right join keys. - Prove uniqueness through `SubqueryAlias` / `Projection` wrappers over `Aggregate` plans whose group expressions are covered by the join keys. - Keep the existing inner join when the projection references right columns or right-side uniqueness cannot be proven. - Update TPC-H plan snapshots for Q2, Q17, and Q20. ## Are these changes tested? Yes. - `cargo test -p datafusion-optimizer eliminate_join --lib` - `INCLUDE_TPCH=true cargo test -p datafusion-sqllogictest --test sqllogictests -- tpch` - `cargo fmt --all` - `ci/scripts/rust_clippy.sh` - `./dev/rust_lint.sh` ## Are there any user-facing changes? No API changes. Eligible logical and physical plans can now show `LeftSemi` instead of `Inner` for projected joins against unique scalar aggregate subquery results. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
