neilconway opened a new pull request, #21464:
URL: https://github.com/apache/datafusion/pull/21464

   ## Which issue does this PR close?
   
   - Closes #21463.
   
   ## Rationale for this change
   
   `find_in_set` uses `PrimitiveArray::<T>::builder` to construct its results, 
which means building the internal null buffer in an iterative fashion (via 
repeated `append_null` calls). It is more efficient to construct the null 
buffer directly, via `NullBuffer::union` (when multiple arguments might be 
NULL) or just cloning the input null buffer (when passed a single argument).
   
   Benchmarks (ARM64):
   
   ```
     - find_in_set/string_len_8: 589.0 µs → 529.0 µs (-10.2%)
     - find_in_set/string_len_32: 736.2 µs → 660.5 µs (-10.3%)
     - find_in_set/string_len_1024: 7.5 ms → 7.4 ms (-1.3%)
     - find_in_set/string_view_len_8: 616.0 µs → 579.9 µs (-5.9%)
     - find_in_set/string_view_len_32: 748.0 µs → 701.9 µs (-6.2%)
     - find_in_set/string_view_len_1024: 7.6 ms → 7.6 ms (0.0%)
     - find_in_set_scalar/string_len_8: 76.7 µs → 48.0 µs (-37.4%)
     - find_in_set_scalar/string_len_32: 76.5 µs → 47.6 µs (-37.8%)
     - find_in_set_scalar/string_len_1024: 76.2 µs → 48.0 µs (-37.0%)
     - find_in_set_scalar/string_view_len_8: 81.9 µs → 55.7 µs (-32.0%)
     - find_in_set_scalar/string_view_len_32: 85.5 µs → 56.8 µs (-33.6%)
     - find_in_set_scalar/string_view_len_1024: 85.2 µs → 57.4 µs (-32.6%)
   ```
   
   The change should be an improvement for both scalar and array cases. The 
relative improvement is larger in the scalar case because the scalar case is 
doing less work and so NULL handling was a larger fraction of the total runtime.
   
   ## What changes are included in this PR?
   
   * Optimize NULL handling for both scalar and array arg cases
   
   ## Are these changes tested?
   
   Yes.
   
   ## Are there any user-facing changes?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to