neilconway opened a new pull request, #21980: URL: https://github.com/apache/datafusion/pull/21980
## Which issue does this PR close? - Closes #21813. ## Rationale for this change This PR implements two optimizations for `lower` and `upper` on ASCII strings: 1. For the `Utf8`/`LargeUtf8` code path, we previously did the case conversion via `str::to_uppercase` or `str::to_lowercase`. For ASCII inputs, it is a bit faster to use `map(u8::to_ascii_lowercase).collect()` over the bytes of the string directly: although the stdlib functions are well-optimized, they need to check again on every string to see if it is ASCII. Since we know the input is all-ASCII, we can avoid that check. 2. The `Utf8View` code path previously wasn't optimized for ASCII strings; add a new code path that is. ## What changes are included in this PR? * Implement optimizations * Share StringViewArray buffer size constants with the bulk-NULL builders ## Are these changes tested? Covered by existing tests. ## Are there any user-facing changes? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
