Dandandan opened a new pull request, #21809: URL: https://github.com/apache/datafusion/pull/21809
## Which issue does this PR close? - Closes #. ## Rationale for this change The `DecorrelatePredicateSubquery` optimizer rule currently rewrites `IN`/`EXISTS` subquery predicates to a `LeftSemi` join with the outer query on the left and the decorrelated subquery on the right. Switching the default to `RightSemi` (with the subquery on the left and the outer query on the right) changes which side is the build side in downstream join implementations. In the typical shape for subquery decorrelation — a smaller subquery result probed against a larger outer relation — this puts the smaller side on the build side, which tends to be a better default. The two plans are semantically equivalent: `LeftSemi(outer, sub)` returns outer rows that match the subquery, and `RightSemi(sub, outer)` returns the same outer rows via the right input. ## What changes are included in this PR? - `DecorrelatePredicateSubquery::build_join_top` now picks `JoinType::RightSemi` for non-negated `IN`/`EXISTS`. - `build_join` swaps left/right inputs when emitting a `RightSemi` join, so the resulting plan still produces outer rows. - The `NOT IN` / `NOT EXISTS` path continues to use `LeftAnti` (including the existing null-aware handling) — it is unchanged in this PR. - Doc-comment examples, unit-test snapshots, and `sqllogictest` expected plans are updated to match the new output shape. ## Are these changes tested? Yes — existing tests are updated to the new plan shape: - `cargo test -p datafusion-optimizer` (unit + integration snapshots regenerated via `cargo insta`) - `cargo test --test sqllogictests -p datafusion-sqllogictest` (all 463 SLT files) - `INCLUDE_TPCH=true cargo test --test sqllogictests -p datafusion-sqllogictest` (all 464 SLT files, including TPC-H plan snapshots) - `cargo fmt --all` / `cargo clippy -p datafusion-optimizer --all-targets -- -D warnings` ## Are there any user-facing changes? The optimized logical/physical plan for `IN`/`EXISTS` subqueries now shows `RightSemi` with the subquery as the left child. Query results are unchanged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
