wombatu-kun opened a new pull request, #16349: URL: https://github.com/apache/iceberg/pull/16349
## What & why **This implements the existing `// TODO: direct conversion from string to byte buffer` in `SparkValueWriters`.** The Spark data layer converted UUID values through an intermediate `String` and a `java.util.UUID` object on every value, in both directions: write did `UUID.fromString(s.toString())` then re-serialized the two longs to 16 bytes; read did `UUIDUtil.convert(buf).toString()` then wrapped that as `UTF8String`. The UUID arrives as the ASCII bytes of its canonical string and leaves as 16 raw bytes (and vice versa), so the `String`/`UUID` objects are pure per-row allocation overhead. ## Changes - Add `UUIDUtil.convertToByteBuffer(byte[] uuidStringBytes, ByteBuffer reuse)` — parses the 36 ASCII bytes of a canonical UUID string directly into the 16-byte big-endian form. - Add `UUIDUtil.convertToStringBytes(ByteBuffer uuidBytes, byte[] reuse)` — renders 16 bytes back to the 36 ASCII bytes of the canonical string. - Rewire all UUID read/write sites in Avro/Parquet/ORC `Spark*Readers`/`Spark*Writers` for Spark 3.4, 3.5, 4.0, 4.1 to use these helpers. - Add `TestUUIDUtil` coverage for the new methods. ## Correctness Both helpers pivot on the `(mostSigBits, leastSigBits)` long pair: the parser reproduces `java.util.UUID.fromString` (parse `[0,8)` → `<<16 | [9,13)` → `<<16 | [14,18)` for MSB; `[19,23)` → `<<48 | [24,36)` for LSB) and then `putLong(0, msb); putLong(8, lsb)` exactly as the previous `convertToByteBuffer(UUID, reuse)`; the formatter is the inverse and matches `UUID.toString()`. The output is therefore byte-for-byte identical to the previous code. The write side keeps the reusable thread-local buffer; the read side must allocate a fresh array because `UTF8String.fromBytes` wraps without copying (a reused buffer would alias across rows). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
