neilconway opened a new pull request, #21789: URL: https://github.com/apache/datafusion/pull/21789
## Which issue does this PR close? - Part of #21684 ## Rationale for this change Introduce three new string array builders with bulk null tracking: - `StringArrayBuilder` (Utf8) - `LargeStringArrayBuilder` (LargeUtf8) - `StringViewArrayBuilder` (Utf8View) Each builder has the following API: - append_value(&str) -- add a non-NULL value (row) - append_placeholder() -- add a NULL row placeholder - finish(Option<NullBuffer>) -- finish the build, specify NULLs These are the counterpart of Arrow's `GenericStringBuilder` / `StringViewBuilder` but it skips per-row NULL buffer maintenance, which lets callers compute the NULL buffer in bulk when possible. This PR also switches `case_conversion` to use the new APIs, which is used to implement `lower`, `upper`, and the Spark equivalents. This improves `lower` / `upper` performance by 3-15% on microbenchmarks. More UDFs (~10) will be converted to use this API in future PRs. ## What changes are included in this PR? * Add new builders * Add unit tests * Adopt builders in `case_conversion` ## Are these changes tested? Yes. ## Are there any user-facing changes? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
