neilconway commented on code in PR #21877:
URL: https://github.com/apache/datafusion/pull/21877#discussion_r3149575463


##########
datafusion/functions/src/unicode/substrindex.rs:
##########
@@ -261,30 +261,37 @@ fn visible_string_bytes<T: OffsetSizeTrait>(
     offsets[offsets.len() - 1].as_usize() - offsets[0].as_usize()
 }
 
-fn substr_index_general<'a, S, B>(
+fn substr_index_general<'a, S, O>(
     string_array: S,
     delimiter_array: S,
     count_array: &PrimitiveArray<Int64Type>,
-    mut builder: B,
+    mut builder: GenericStringArrayBuilder<O>,
 ) -> Result<ArrayRef>
 where
     S: StringArrayType<'a> + Copy,
-    B: StringLikeArrayBuilder,
+    O: OffsetSizeTrait,
 {
-    for ((string, delimiter), n) in string_array
-        .iter()
-        .zip(delimiter_array.iter())
-        .zip(count_array.iter())
-    {
-        match (string, delimiter, n) {
-            (Some(string), Some(delimiter), Some(n)) => {
-                builder.append_value(substr_index_slice(string, delimiter, n));
-            }
-            _ => builder.append_null(),
+    let num_rows = string_array.len();
+    // Output is null IFF any input is null.
+    let nulls = NullBuffer::union(

Review Comment:
   Good question, @RatulDawar !
   
   Empirically, taking this approach appears to be a significant performance 
win (although in some cases the overhead of NULL handling might not be huge to 
begin with, depending on the UDF). Intuitively, it makes sense that this would 
be faster:
   
   * Computing the union of two bitmaps can be accelerated with SIMD, and the 
Arrow primitives are well-optimized. It's also an entirely sequential memory 
access pattern.
   * `append_value` in the bulk-NULL builder does not need to touch the NULL 
bitmap for every row, whereas in the Arrow builder it does. That means fewer 
per-row branches and also less data cache pressure.
   * Similarly, we can now just check a single bit to look for NULLs, rather 
than doing three conditionals. It's also easy for the compiler to hoist the 
is-the-NULL-bitmap-None check outside of the loop and elide it entirely (I 
haven't checked if LLVM is actually doing this but it could).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to