wombatu-kun opened a new pull request, #16789:
URL: https://github.com/apache/iceberg/pull/16789

   `RowDataWrapper` adapts a Flink `RowData` to an Iceberg `StructLike` and 
sits on the hot path of every partitioned write (`PartitionKey.partition`), 
equality-delete key extraction, and range-shuffle sort-key computation. Its 
constructor pre-builds a getter for most types, but `buildGetter` returns 
`null` for the `default` case, which covers the most common primitive types: 
`INT`, `BIGINT`, `BOOLEAN`, `FLOAT`, `DOUBLE`, and `DATE`. For those fields, 
`get(pos, javaClass)` falls back to constructing a fresh Flink field getter on 
every access:
   
   ```java
   Object value = FlinkRowData.createFieldGetter(types[pos], 
pos).getFieldOrNull(rowData);
   ```
   
   `FlinkRowData.createFieldGetter` allocates two lambdas per call (Flink's 
`RowData.createFieldGetter` plus the null-checking wrapper around it), so every 
row that is partitioned, keyed, or shuffled on a primitive column allocates two 
short-lived objects for each such field.
   
   This pre-builds the fallback getter once in the constructor, the same way 
the wrapper already pre-builds getters for the handled types, so `get` no 
longer allocates. Behavior is unchanged: the raw Flink value is already the 
correct Iceberg representation for every default-case type, which is exactly 
what the previous fallback returned.
   
   The change is identical across the supported Flink versions, so it is 
applied to v1.20, v2.0, and v2.1 in this PR.
   
   ### Benchmark
   
   JMH microbenchmark (JDK 17, `-prof gc`) that wraps a row and reads its 
fields. The schema has seven columns; `readPrimitiveKey` reads only the five 
primitive columns, representative of a typical partition or equality key.
   
   | Benchmark | Metric | Before | After | Delta |
   | --- | --- | --- | --- | --- |
   | readPrimitiveKey | time | 64.04 ns/op | 45.31 ns/op | -29.3% |
   | readPrimitiveKey | alloc | 275.0 B/op | 75.0 B/op | -72.7% |
   | readAllFields | time | 84.25 ns/op | 81.37 ns/op | -3.4% |
   | readAllFields | alloc | 275.0 B/op | 75.0 B/op | -72.7% |
   
   The remaining 75 B/op is return-value boxing that this change does not touch.
   
   Existing `TestRowDataWrapper` coverage passes unchanged.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to