wombatu-kun opened a new pull request, #16722: URL: https://github.com/apache/iceberg/pull/16722
`IntegerAsDecimalReader` and `LongAsDecimalReader` build each value with `new BigDecimal(BigInteger.valueOf(unscaled), scale)`. Because `new BigDecimal(BigInteger, scale)` keeps the `BigInteger` in the result's `intVal` field, that intermediate `BigInteger` escapes and is a real per-value heap allocation for any value outside BigInteger's small-value cache. `BigDecimal.valueOf(long, scale)` builds the decimal directly from the compact long and allocates no `BigInteger`, producing a value-equal result. This is the same idiom already used in `ParquetConversions.convertValue` and `ParquetConversions.converterFromParquet`; this change just brings the readers in line. These readers are the dispatch target of `ParquetValueReaders.bigDecimals(...)`, which backs `GenericParquetReaders` and `InternalReader` (via `BaseParquetReaders`) and `ParquetAvroValueReaders`, so the change covers the module's engine-agnostic read paths for every INT32/INT64-backed decimal value. Benchmarked with JMH and the gc profiler. Isolated per-value construction (one op builds 1024 values): | Reader | Alloc before | Alloc after | Time before | Time after | |---|---|---|---|---| | int (INT32) | 112 B/value | 48 B/value | 10.9 us/op | 4.6 us/op | | long (INT64) | 112 B/value | 48 B/value | 11.5 us/op | 5.1 us/op | End-to-end read of 1M rows with one INT32 and one INT64 decimal column (`gc.alloc.rate.norm`, SingleShotTime): | Metric | Before | After | Delta | |---|---|---|---| | Allocation | 361 MB/op | 233 MB/op | -128 MB (-35%), exactly 64 B x 2M values | | GC count | 10 | 5 | -5 | | Time | 201 +/- 14 ms/op | 239 +/- 72 ms/op | within noise (overlapping CIs) | Wall-clock at the full-read level is within noise because per-value decoding is a small fraction of total read time; the CPU win is visible in the isolated benchmark. Correctness is covered by the existing `TestGenericData` and `TestGenericReadProjection` decimal round-trips (INT32 `decimal(9,2)` and INT64 `decimal(18,2)`). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
