neilconway opened a new pull request, #21980:
URL: https://github.com/apache/datafusion/pull/21980

   ## Which issue does this PR close?
   
   - Closes #21813.
   
   ## Rationale for this change
   
   This PR implements two optimizations for `lower` and `upper` on ASCII 
strings:
   
   1. For the `Utf8`/`LargeUtf8` code path, we previously did the case 
conversion via `str::to_uppercase` or `str::to_lowercase`. For ASCII inputs, it 
is a bit faster to use `map(u8::to_ascii_lowercase).collect()` over the bytes 
of the string directly: although the stdlib functions are well-optimized, they 
need to check again on every string to see if it is ASCII. Since we know the 
input is all-ASCII, we can avoid that check.
   2. The `Utf8View` code path previously wasn't optimized for ASCII strings; 
add a new code path that is.
   
   ## What changes are included in this PR?
   
   * Implement optimizations
   * Share StringViewArray buffer size constants with the bulk-NULL builders
   
   ## Are these changes tested?
   
   Covered by existing tests.
   
   ## Are there any user-facing changes?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to