zeroshade commented on code in PR #1355:
URL: https://github.com/apache/iceberg-go/pull/1355#discussion_r3521437645
##########
manifest.go:
##########
@@ -1848,18 +1864,28 @@ func convertTimestampMicrosValue(v any) any {
return v
}
-func convertDecimalValue(v any) any {
- if dec, ok := v.(Decimal); ok {
- fixedSize := internal.DecimalRequiredBytes(len(dec.String()))
- bytes, err := DecimalLiteral(dec).MarshalBinary()
- if err != nil {
- return v
- }
+func convertDecimalValue(v any, fixedSize int) (any, error) {
+ switch dec := v.(type) {
+ case Decimal:
+ return encodeDecimalBytes(DecimalLiteral(dec), fixedSize)
+ case DecimalLiteral:
+ return encodeDecimalBytes(dec, fixedSize)
+ default:
+ return v, nil
+ }
+}
- return padOrTruncateBytes(bytes, fixedSize)
+func encodeDecimalBytes(dec DecimalLiteral, fixedSize int) ([]byte, error) {
+ if fixedSize <= 0 {
+ return nil, fmt.Errorf("missing decimal fixed size")
}
- return v
+ bytes, err := dec.MarshalBinary()
+ if err != nil {
+ return nil, err
+ }
+
+ return padOrTruncateBytes(bytes, fixedSize), nil
Review Comment:
[BLOCKER] This call still routes through `padOrTruncateBytes`, whose
implementation at lines 1899-1905 left-pads short decimal byte slices with
`0x00` unconditionally. That is only correct for non-negative values; a
negative unscaled value such as -1 for precision 10 (5 bytes) becomes `00 00 00
00 ff` and decodes as +255. Fix with sign-aware fixed-width encoding: when
`len(bytes) < fixedSize`, prepend `0xff` if `bytes[0]&0x80 != 0`, otherwise
`0x00` (two's-complement sign extension), and add
negative/zero/max-precision-38 vectors.
##########
manifest_test.go:
##########
@@ -2044,6 +2046,25 @@ func
TestReadManifestDecodesNilLogicalPartitionValueFromNullableUnion(t *testing
}
}
+func TestAvroEncodePartitionDataUsesDeclaredDecimalFixedSize(t *testing.T) {
Review Comment:
[MINOR] This test only covers one positive decimal, so it misses the
negative sign-extension bug and oversized-value truncation. Please make this
table-driven with positive, zero, negative, max-width, and too-large vectors,
and include a `ManifestWriter`/`ReadManifest` round-trip.
##########
manifest.go:
##########
@@ -1848,18 +1864,28 @@ func convertTimestampMicrosValue(v any) any {
return v
}
-func convertDecimalValue(v any) any {
- if dec, ok := v.(Decimal); ok {
- fixedSize := internal.DecimalRequiredBytes(len(dec.String()))
- bytes, err := DecimalLiteral(dec).MarshalBinary()
- if err != nil {
- return v
- }
+func convertDecimalValue(v any, fixedSize int) (any, error) {
+ switch dec := v.(type) {
+ case Decimal:
+ return encodeDecimalBytes(DecimalLiteral(dec), fixedSize)
+ case DecimalLiteral:
+ return encodeDecimalBytes(dec, fixedSize)
+ default:
+ return v, nil
+ }
+}
- return padOrTruncateBytes(bytes, fixedSize)
+func encodeDecimalBytes(dec DecimalLiteral, fixedSize int) ([]byte, error) {
+ if fixedSize <= 0 {
+ return nil, fmt.Errorf("missing decimal fixed size")
}
- return v
+ bytes, err := dec.MarshalBinary()
+ if err != nil {
+ return nil, err
+ }
+
+ return padOrTruncateBytes(bytes, fixedSize), nil
Review Comment:
[MAJOR] `padOrTruncateBytes` also silently truncates oversized decimals to
the low-order bytes (see real lines 1899-1901), which can change the value
instead of reporting that it does not fit the declared fixed size. Since this
path already returns an error, make `len(bytes) > fixedSize` verify the dropped
prefix is redundant sign extension and otherwise return an overflow/precision
error.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]