mbutrovich opened a new pull request, #21484: URL: https://github.com/apache/datafusion/pull/21484
## Which issue does this PR close? N/A. ## Rationale for this change Sort merge join comparisons (`compare_join_arrays`, `is_join_arrays_equal`) do a `match DataType` + `downcast_ref` on every call, per column. These are called per-row in hot join loops across SMJ, semi/anti/mark SMJ, and piecewise merge join. `arrow_ord::ord::make_comparator` does the type dispatch once at construction and returns a `DynComparator` closure that goes straight to typed value comparison. Arrow's own `LexicographicalComparator` uses this pattern for sorting — we should use it for joins too. ## What changes are included in this PR? Adds `JoinKeyComparator` to `joins/utils.rs`: a thin wrapper around `Vec<DynComparator>` built once per batch pair. Null handling (`NullEqualsNothing` both-null -> `Less` override) is baked into the closures at construction time so `compare()` is a branchless loop. Integrated into all hot-path call sites: - `materializing_stream.rs`: `streamed_buffered_cmp` (streamed vs buffered) and `buffered_equality_cmp` (head vs tail equality) - `bitwise_stream.rs`: `outer_inner_cmp`, `outer_self_cmp`, `inner_self_cmp`; simplified `find_key_group_end` signature (takes `&JoinKeyComparator`, returns `usize` instead of `Result<usize>` since type errors are now caught at construction) - `piecewise_merge_join/classic_join.rs`: single comparator built per batch pair `compare_join_arrays` is kept for the one-off `keys_match` call (once per batch boundary). Deleted `is_join_arrays_equal` (75-line per-row type dispatch function), replaced by `JoinKeyComparator::is_equal`. ## Are these changes tested? - 4 unit tests for `JoinKeyComparator`: multi-column mixed types, `NullEqualsNull`, `NullEqualsNothing`, `nulls_first` ordering - Existing SMJ test suites pass - Existing sqllogictest join tests pass ## Are there any user-facing changes? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
