wombatu-kun opened a new pull request, #16789: URL: https://github.com/apache/iceberg/pull/16789
`RowDataWrapper` adapts a Flink `RowData` to an Iceberg `StructLike` and sits on the hot path of every partitioned write (`PartitionKey.partition`), equality-delete key extraction, and range-shuffle sort-key computation. Its constructor pre-builds a getter for most types, but `buildGetter` returns `null` for the `default` case, which covers the most common primitive types: `INT`, `BIGINT`, `BOOLEAN`, `FLOAT`, `DOUBLE`, and `DATE`. For those fields, `get(pos, javaClass)` falls back to constructing a fresh Flink field getter on every access: ```java Object value = FlinkRowData.createFieldGetter(types[pos], pos).getFieldOrNull(rowData); ``` `FlinkRowData.createFieldGetter` allocates two lambdas per call (Flink's `RowData.createFieldGetter` plus the null-checking wrapper around it), so every row that is partitioned, keyed, or shuffled on a primitive column allocates two short-lived objects for each such field. This pre-builds the fallback getter once in the constructor, the same way the wrapper already pre-builds getters for the handled types, so `get` no longer allocates. Behavior is unchanged: the raw Flink value is already the correct Iceberg representation for every default-case type, which is exactly what the previous fallback returned. The change is identical across the supported Flink versions, so it is applied to v1.20, v2.0, and v2.1 in this PR. ### Benchmark JMH microbenchmark (JDK 17, `-prof gc`) that wraps a row and reads its fields. The schema has seven columns; `readPrimitiveKey` reads only the five primitive columns, representative of a typical partition or equality key. | Benchmark | Metric | Before | After | Delta | | --- | --- | --- | --- | --- | | readPrimitiveKey | time | 64.04 ns/op | 45.31 ns/op | -29.3% | | readPrimitiveKey | alloc | 275.0 B/op | 75.0 B/op | -72.7% | | readAllFields | time | 84.25 ns/op | 81.37 ns/op | -3.4% | | readAllFields | alloc | 275.0 B/op | 75.0 B/op | -72.7% | The remaining 75 B/op is return-value boxing that this change does not touch. Existing `TestRowDataWrapper` coverage passes unchanged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
