andygrove opened a new issue, #4626:
URL: https://github.com/apache/datafusion-comet/issues/4626

   ## What
   
   Pick up the work started in #3659 (which I am closing). The goal is to speed 
up the nullable bulk-append paths in `native/shuffle/src/spark_unsafe/list.rs` 
(`impl_append_to_builder` macro, `append_booleans`, `append_timestamps`, 
`append_dates`) by using slice-based access instead of per-element 
`read_unaligned()` + manual pointer arithmetic.
   
   ## Where
   
   `native/shuffle/src/spark_unsafe/list.rs`
   
   Microbenchmark: `native/core/benches/array_element_append.rs` exercises i32, 
i64, f64, date32, and timestamp with and without nulls at 10k elements.
   
   ## Background
   
   #3659 had a draft that converted nullable paths to slice access guarded by a 
runtime alignment check. Review feedback from @mbutrovich on that PR (worth 
reading the inline comments before starting):
   
   1. **Alignment is NOT guaranteed** — the PR body claimed otherwise, but 
`unsafe_object.rs:49` and the existing comment at `list.rs:105` document that 
the array base can be unaligned when nested in a row's variable-length region. 
The runtime alignment check is load-bearing for soundness; 
`std::slice::from_raw_parts::<T>` on a misaligned pointer is UB. Keep the 
check, and make the PR description accurately describe the unaligned fallback.
   
   2. **The bigger win is removing the per-element null branch** — in the 
nullable path, the loop still branches on `is_null_in_bitset` per element. 
Reviewer suggested either:
      - `PrimitiveBuilder::append_values(values: &[T::Native], is_valid: 
&[bool])` after materializing a `&[bool]` from the null bitset
      - Build a `BooleanBuffer` from the Spark null bitmap and feed validity to 
the builder directly (skips the bool materialization)
   
      For booleans (alignment 1, no alignment fallback needed), 
`BooleanBuilder` may be able to absorb the validity in bulk too.
   
   3. **Post benchmark numbers from `array_element_append.rs`** — the 
non-nullable aligned branch is structurally unchanged, so the interesting 
comparison is the nullable rows, especially if the bulk-validity change is 
included.
   
   ## Suggested approach
   
   - Land the loop reshape + bulk validity together (rather than splitting), so 
the benchmark numbers tell a complete story.
   - Update the PR description to reflect that the unaligned fallback exists 
and explain *why* (nested-array case) so a future reader does not delete the 
runtime check.
   
   ## References
   
   - Closed PR: #3659
   - Inline review comments: 
https://github.com/apache/datafusion-comet/pull/3659#pullrequestreview-4320281277


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to