wombatu-kun opened a new pull request, #16627:
URL: https://github.com/apache/iceberg/pull/16627

   ## Problem
   
   Reading a decimal column through the vectorized Arrow reader silently 
corrupts values whose unscaled magnitude exceeds `Long.MAX_VALUE`. This affects 
any decimal with precision larger than 18 (for example `decimal(38, 0)`) 
holding a sufficiently large value. No error is raised; the returned 
`BigDecimal` is simply wrong, and often negative.
   
   ## Root cause
   
   Decimals with precision larger than 18 are stored as a binary / 
`FIXED_LEN_BYTE_ARRAY` and read into a `FixedSizeBinaryVector`. The 
binary-backed decimal accessors decode the bytes into the correct `BigDecimal` 
and then hand it to `JavaDecimalFactory.ofBigDecimal`, which rebuilds it as 
`BigDecimal.valueOf(value.unscaledValue().longValue(), scale)`. 
`BigInteger.longValue()` keeps only the low 64 bits, so any unscaled value 
beyond `Long` range is truncated. The incoming `value` is already the correct 
`BigDecimal` (it carries the right unscaled value and scale), so this 
round-trip is both unnecessary and lossy.
   
   The `ofLong` path used for INT32/INT64-backed decimals (precision up to 18) 
is unaffected, which is why only high-precision decimals are corrupted and the 
existing tests, which use `decimal(9, 2)`, never caught it.
   
   ## Fix
   
   Return `value` unchanged. It already represents the decimal with the correct 
unscaled value and scale, matching how the Spark accessor factory preserves the 
full value.
   
   ## Tests
   
   Added `TestArrowReader.testHighPrecisionDecimalIsReadCorrectly`, which 
writes a `decimal(38, 0)` Parquet file with values larger than `Long.MAX_VALUE` 
and asserts they round-trip through the vectorized reader. It fails before the 
fix (`expected 12345678901234567890 but was -6101065172474983726`) and passes 
after.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to