[PR] perf(spark): use 256-entry byte-pair table in hex encoding [datafusion]

via GitHub Fri, 24 Apr 2026 18:43:36 -0700


Scolliq opened a new pull request, #21836:
URL: https://github.com/apache/datafusion/pull/21836


   Refs #15986.
   
   **Why:** `spark_hex` walked one nibble at a time — two `HEX_CHARS[i]` 
lookups and two `Vec::push` calls per input byte. The hot loop flattens into 
one indexed load and one `extend_from_slice` per byte with a precomputed table.
   
   **What changed:** added `HEX_LOOKUP_LOWER` / `HEX_LOOKUP_UPPER` as `[[u8; 
2]; 256]` const tables built at compile time. Bytes path now does a single 
lookup + 2-byte extend per input byte. The int64 path consumes two nibbles per 
iteration via the same table, with a fall-through for the high nibble. 
Behaviour for `0`, `i64::MAX`, `i64::MIN`, `-1` preserved.
   
   **Tests:** extended `test_hex_int64` to cover edge values; new 
`test_hex_lookup_table_covers_all_bytes` cross-checks every entry against 
`format!("{:02X/x}")`; new `test_spark_hex_binary_round_trip_all_bytes` feeds 
all 256 byte values through `spark_hex` and verifies the result.
   
   `cargo test -p datafusion-spark --lib hex` → 8 pass. `cargo clippy 
--all-features --all-targets` clean. `cargo bench --no-run` builds — existing 
`benches/hex.rs` already covers 
Int64/Utf8/Utf8View/LargeUtf8/Binary/LargeBinary plus dict paths.
   
   **Not in this PR:** the #15947 review also flagged Utf8View output and 
dictionary-key reuse — those felt worth their own PRs to keep this focused on 
the per-byte hot path.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] perf(spark): use 256-entry byte-pair table in hex encoding [datafusion]

Reply via email to