EeshanBembi opened a new pull request, #21794: URL: https://github.com/apache/datafusion/pull/21794
## Which issue does this PR close? Closes #21568. ## Rationale for this change `ByteViewGroupValueBuilder::vectorized_append` was doing unnecessary work for short strings (≤12 bytes): for each row it called `array.value(row)` to decode the u128 view into a `&[u8]`, then called `make_view` to re-encode it back into a u128. The input `GenericByteViewArray` already stores inline values in exactly that u128 format, so the round-trip is redundant. This mirrors the existing `HAS_BUFFERS` specialisation in `vectorized_equal_to_inner`, which uses the same `data_buffers().is_empty()` guard to take a direct-view-compare fast path for inline strings. ## What changes are included in this PR? In `vectorized_append_inner`, the `Nulls::None` branch now dispatches on `arr.data_buffers().is_empty()`: - **Fast path** (no data buffers → all values ≤12 bytes inline): copies u128 views directly via `self.views.extend(rows.iter().map(|&row| arr.views()[row]))`. Arrow's validity invariant guarantees inline views are zero-padded, so direct copy is semantically identical to `value() → make_view()`. - **Slow path** (array has non-inline strings): adds `self.views.reserve(rows.len())` before the existing loop to avoid repeated reallocation. ## Are these changes tested? Covered by the existing 6 unit tests in `bytes_view::tests`, all passing unchanged. `test_byte_view_vectorized_operation_special_case` exercises the fast path directly (11-byte strings, no data buffers). ## Are there any user-facing changes? No. Internal performance improvement only. ## Benchmark `inline_null_0.0_size_1000/vectorized_append` (8-byte strings, no nulls, 1 000 rows): | | time | |---|---| | Before | 3.37 µs | | After | 495 ns | | Change | **−85.3% (6.8× faster)** | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
