mzzz-zzm opened a new issue, #1027:
URL: https://github.com/apache/iceberg-go/issues/1027
### Apache Iceberg version
main (development)
### Please describe the bug 🐞
**Component**: `manifest.go`
**Affected version**: `main` (v0.5.0)
**Severity**: Data corruption / Avro encode error for decimal partition
columns
---
`convertDecimalValue` computes the Avro `fixed[N]` byte size by passing
`len(dec.String())` to `internal.DecimalRequiredBytes`, but that function
expects the column's declared **precision**. These two quantities are
unrelated,
causing the wrong number of bytes to be allocated and the Avro encoder to
reject
the value.
---
## Affected code
`manifest.go`, `convertDecimalValue`:
```go
func convertDecimalValue(v any) any {
if dec, ok := v.(Decimal); ok {
fixedSize := internal.DecimalRequiredBytes(len(dec.String())) // ←
BUG
bytes, err := DecimalLiteral(dec).MarshalBinary()
...
return padOrTruncateBytes(bytes, fixedSize)
}
return v
}
```
`internal.DecimalRequiredBytes` signature (`internal/avro_schemas.go`):
```go
// DecimalRequiredBytes returns the required number of bytes to store a
// decimal value with the given precision.
func DecimalRequiredBytes(precision int) int { ... }
```
---
## Root cause
`dec.String()` returns the human-readable decimal string (e.g. `"1.00"`).
Its character length is not the declared precision of the column — it varies
with the specific value being encoded.
| Column type | Value | `dec.String()` | `len(...)` |
`DecimalRequiredBytes(len)` | Correct `DecimalRequiredBytes(precision)` |
|---------------|---------|----------------|------------|-----------------------------|-------------------------------------------|
| `decimal(10,2)` | `10.00` | `"10.00"` | 5 | 3 bytes | 5 bytes |
| `decimal(10,2)` | `1.00` | `"1.00"` | 4 | 2 bytes | 5 bytes ← **wrong** |
| `decimal(18,0)` | `1` | `"1"` | 1 | 1 byte | 8 bytes ← **wrong** |
The `decimal(10,2)` / `10.00` case produces the right size by coincidence
(`len("10.00") == 5 == DecimalRequiredBytes(10)`), masking the bug for that
specific value.
---
## Symptom
Any write of a manifest with a decimal partition column where the value's
string representation length differs from the column's precision produces:
```
avro: field data_file.partition.<field>: cannot use []uint8 with Avro type
fixed
```
---
## Reproduction
1. Create a table with partition spec `identity(price)` where `price` is
`decimal(10, 2)`.
2. Write a `RowDelta` that adds a data file with partition value `1.00`
(string length 4, but declared precision is 10).
3. Observe the Avro encode error when the manifest is flushed.
---
## Proposed fix
Thread the column's declared precision from the `DecimalType` of the
partition
field through to `convertDecimalValue`:
```go
// Pass precision from the field's type, not from the value's string length.
func convertDecimalValue(v any, precision int) any {
if dec, ok := v.(Decimal); ok {
fixedSize := internal.DecimalRequiredBytes(precision)
bytes, err := DecimalLiteral(dec).MarshalBinary()
if err != nil {
return v
}
return padOrTruncateBytes(bytes, fixedSize)
}
return v
}
```
The call site in the partition value serialization loop already has access
to the
`PartitionField` → `DecimalType` → `Precision`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]