neilconway opened a new pull request, #21789:
URL: https://github.com/apache/datafusion/pull/21789

   ## Which issue does this PR close?
   
   - Part of #21684
   
   ## Rationale for this change
   
   Introduce three new string array builders with bulk null tracking:
   
   - `StringArrayBuilder` (Utf8)
   - `LargeStringArrayBuilder` (LargeUtf8)
   - `StringViewArrayBuilder` (Utf8View)
   
   Each builder has the following API:
   
   - append_value(&str) -- add a non-NULL value (row)
   - append_placeholder() -- add a NULL row placeholder
   - finish(Option<NullBuffer>) -- finish the build, specify NULLs
   
   These are the counterpart of Arrow's `GenericStringBuilder` /
   `StringViewBuilder` but it skips per-row NULL buffer maintenance, which
   lets callers compute the NULL buffer in bulk when possible.
   
   This PR also switches `case_conversion` to use the new APIs, which is
   used to implement `lower`, `upper`, and the Spark equivalents. This
   improves `lower` / `upper` performance by 3-15% on microbenchmarks. More
   UDFs (~10) will be converted to use this API in future PRs.
   
   ## What changes are included in this PR?
   
   * Add new builders
   * Add unit tests
   * Adopt builders in `case_conversion`
   
   ## Are these changes tested?
   
   Yes.
   
   ## Are there any user-facing changes?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to