mbutrovich opened a new pull request, #21484:
URL: https://github.com/apache/datafusion/pull/21484

   ## Which issue does this PR close?
   N/A.
   
   ## Rationale for this change
   Sort merge join comparisons (`compare_join_arrays`, `is_join_arrays_equal`) 
do a `match DataType` + `downcast_ref` on every call, per column. These are 
called per-row in hot join loops across SMJ, semi/anti/mark SMJ, and piecewise 
merge join.
   
   `arrow_ord::ord::make_comparator` does the type dispatch once at 
construction and returns a `DynComparator` closure that goes straight to typed 
value comparison. Arrow's own `LexicographicalComparator` uses this pattern for 
sorting — we should use it for joins too.
   
   ## What changes are included in this PR?
   
   Adds `JoinKeyComparator` to `joins/utils.rs`: a thin wrapper around 
`Vec<DynComparator>` built once per batch pair. Null handling 
(`NullEqualsNothing` both-null -> `Less` override) is baked into the closures 
at construction time so `compare()` is a branchless loop.
   
   Integrated into all hot-path call sites:
   - `materializing_stream.rs`: `streamed_buffered_cmp` (streamed vs buffered) 
and `buffered_equality_cmp` (head vs tail equality)
   - `bitwise_stream.rs`: `outer_inner_cmp`, `outer_self_cmp`, 
`inner_self_cmp`; simplified `find_key_group_end` signature (takes 
`&JoinKeyComparator`, returns `usize` instead of `Result<usize>` since type 
errors are now caught at construction)                                     
   - `piecewise_merge_join/classic_join.rs`: single comparator built per batch 
pair
   
   `compare_join_arrays` is kept for the one-off `keys_match` call (once per 
batch boundary).                                                
   
   Deleted `is_join_arrays_equal` (75-line per-row type dispatch function), 
replaced by `JoinKeyComparator::is_equal`.
   
   ## Are these changes tested?
   - 4 unit tests for `JoinKeyComparator`: multi-column mixed types, 
`NullEqualsNull`, `NullEqualsNothing`, `nulls_first` ordering
   - Existing SMJ test suites pass
   - Existing sqllogictest join tests pass
   
   ## Are there any user-facing changes?
   No. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to