wombatu-kun opened a new pull request, #16627: URL: https://github.com/apache/iceberg/pull/16627
## Problem Reading a decimal column through the vectorized Arrow reader silently corrupts values whose unscaled magnitude exceeds `Long.MAX_VALUE`. This affects any decimal with precision larger than 18 (for example `decimal(38, 0)`) holding a sufficiently large value. No error is raised; the returned `BigDecimal` is simply wrong, and often negative. ## Root cause Decimals with precision larger than 18 are stored as a binary / `FIXED_LEN_BYTE_ARRAY` and read into a `FixedSizeBinaryVector`. The binary-backed decimal accessors decode the bytes into the correct `BigDecimal` and then hand it to `JavaDecimalFactory.ofBigDecimal`, which rebuilds it as `BigDecimal.valueOf(value.unscaledValue().longValue(), scale)`. `BigInteger.longValue()` keeps only the low 64 bits, so any unscaled value beyond `Long` range is truncated. The incoming `value` is already the correct `BigDecimal` (it carries the right unscaled value and scale), so this round-trip is both unnecessary and lossy. The `ofLong` path used for INT32/INT64-backed decimals (precision up to 18) is unaffected, which is why only high-precision decimals are corrupted and the existing tests, which use `decimal(9, 2)`, never caught it. ## Fix Return `value` unchanged. It already represents the decimal with the correct unscaled value and scale, matching how the Spark accessor factory preserves the full value. ## Tests Added `TestArrowReader.testHighPrecisionDecimalIsReadCorrectly`, which writes a `decimal(38, 0)` Parquet file with values larger than `Long.MAX_VALUE` and asserts they round-trip through the vectorized reader. It fails before the fix (`expected 12345678901234567890 but was -6101065172474983726`) and passes after. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
