neilconway opened a new pull request, #21464:
URL: https://github.com/apache/datafusion/pull/21464
## Which issue does this PR close?
- Closes #21463.
## Rationale for this change
`find_in_set` uses `PrimitiveArray::<T>::builder` to construct its results,
which means building the internal null buffer in an iterative fashion (via
repeated `append_null` calls). It is more efficient to construct the null
buffer directly, via `NullBuffer::union` (when multiple arguments might be
NULL) or just cloning the input null buffer (when passed a single argument).
Benchmarks (ARM64):
```
- find_in_set/string_len_8: 589.0 µs → 529.0 µs (-10.2%)
- find_in_set/string_len_32: 736.2 µs → 660.5 µs (-10.3%)
- find_in_set/string_len_1024: 7.5 ms → 7.4 ms (-1.3%)
- find_in_set/string_view_len_8: 616.0 µs → 579.9 µs (-5.9%)
- find_in_set/string_view_len_32: 748.0 µs → 701.9 µs (-6.2%)
- find_in_set/string_view_len_1024: 7.6 ms → 7.6 ms (0.0%)
- find_in_set_scalar/string_len_8: 76.7 µs → 48.0 µs (-37.4%)
- find_in_set_scalar/string_len_32: 76.5 µs → 47.6 µs (-37.8%)
- find_in_set_scalar/string_len_1024: 76.2 µs → 48.0 µs (-37.0%)
- find_in_set_scalar/string_view_len_8: 81.9 µs → 55.7 µs (-32.0%)
- find_in_set_scalar/string_view_len_32: 85.5 µs → 56.8 µs (-33.6%)
- find_in_set_scalar/string_view_len_1024: 85.2 µs → 57.4 µs (-32.6%)
```
The change should be an improvement for both scalar and array cases. The
relative improvement is larger in the scalar case because the scalar case is
doing less work and so NULL handling was a larger fraction of the total runtime.
## What changes are included in this PR?
* Optimize NULL handling for both scalar and array arg cases
## Are these changes tested?
Yes.
## Are there any user-facing changes?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]