viirya opened a new pull request, #4073: URL: https://github.com/apache/datafusion-comet/pull/4073
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> Closes #457. ## Rationale for this change DataFusion 53 introduced null-aware anti-join support in HashJoinExec (fixing apache/datafusion#10583), so Comet can now offload Spark's NOT IN subquery pattern (BroadcastHashJoinExec with isNullAwareAntiJoin=true) to the native layer instead of falling back to Spark. The original BuildRight + LeftAnti rejection (issue #457) was added as a workaround for that null-aware bug; with the bug fixed upstream, the rejection can be removed entirely, enabling all BuildRight + LeftAnti cases (BHJ null-aware, BHJ regular, SHJ regular) to run natively. ## What changes are included in this PR? - Add null_aware_anti_join field to HashJoin proto and pass it to DataFusion's HashJoinExec::try_new() - Set NullAwareAntiJoin from BroadcastHashJoinExec.isNullAwareAntiJoin in the Scala serializer - Skip swap_inputs() for null-aware anti-join (DataFusion only allows null_aware=true with LeftAnti; swap would turn it into RightAnti) - Remove the blanket BuildRight + LeftAnti is not supported rejection in CometHashJoin and RewriteJoin - Add BroadcastHashJoin tests for both null-aware (NOT IN subquery) and non-null-aware (BROADCAST hint with LEFT ANTI JOIN) cases; re-enable the previously-disabled SHJ BuildRight + LeftAnti test ## How are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Unit tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
