mzzz-zzm opened a new issue, #1027:
URL: https://github.com/apache/iceberg-go/issues/1027

   ### Apache Iceberg version
   
   main (development)
   
   ### Please describe the bug 🐞
   
   **Component**: `manifest.go`
   **Affected version**: `main` (v0.5.0)
   **Severity**: Data corruption / Avro encode error for decimal partition 
columns
   
   ---
   
   `convertDecimalValue` computes the Avro `fixed[N]` byte size by passing
   `len(dec.String())` to `internal.DecimalRequiredBytes`, but that function
   expects the column's declared **precision**. These two quantities are 
unrelated,
   causing the wrong number of bytes to be allocated and the Avro encoder to 
reject
   the value.
   
   ---
   
   ## Affected code
   
   `manifest.go`, `convertDecimalValue`:
   
   ```go
   func convertDecimalValue(v any) any {
       if dec, ok := v.(Decimal); ok {
           fixedSize := internal.DecimalRequiredBytes(len(dec.String()))  // ← 
BUG
           bytes, err := DecimalLiteral(dec).MarshalBinary()
           ...
           return padOrTruncateBytes(bytes, fixedSize)
       }
       return v
   }
   ```
   
   `internal.DecimalRequiredBytes` signature (`internal/avro_schemas.go`):
   
   ```go
   // DecimalRequiredBytes returns the required number of bytes to store a
   // decimal value with the given precision.
   func DecimalRequiredBytes(precision int) int { ... }
   ```
   
   ---
   
   ## Root cause
   
   `dec.String()` returns the human-readable decimal string (e.g. `"1.00"`).
   Its character length is not the declared precision of the column — it varies
   with the specific value being encoded.
   
   | Column type   | Value   | `dec.String()` | `len(...)` | 
`DecimalRequiredBytes(len)` | Correct `DecimalRequiredBytes(precision)` |
   
|---------------|---------|----------------|------------|-----------------------------|-------------------------------------------|
   | `decimal(10,2)` | `10.00` | `"10.00"` | 5 | 3 bytes | 5 bytes |
   | `decimal(10,2)` | `1.00`  | `"1.00"`  | 4 | 2 bytes | 5 bytes ← **wrong** |
   | `decimal(18,0)` | `1`     | `"1"`     | 1 | 1 byte  | 8 bytes ← **wrong** |
   
   The `decimal(10,2)` / `10.00` case produces the right size by coincidence
   (`len("10.00") == 5 == DecimalRequiredBytes(10)`), masking the bug for that
   specific value.
   
   ---
   
   ## Symptom
   
   Any write of a manifest with a decimal partition column where the value's
   string representation length differs from the column's precision produces:
   
   ```
   avro: field data_file.partition.<field>: cannot use []uint8 with Avro type 
fixed
   ```
   
   ---
   
   ## Reproduction
   
   1. Create a table with partition spec `identity(price)` where `price` is 
`decimal(10, 2)`.
   2. Write a `RowDelta` that adds a data file with partition value `1.00`
      (string length 4, but declared precision is 10).
   3. Observe the Avro encode error when the manifest is flushed.
   
   ---
   
   ## Proposed fix
   
   Thread the column's declared precision from the `DecimalType` of the 
partition
   field through to `convertDecimalValue`:
   
   ```go
   // Pass precision from the field's type, not from the value's string length.
   func convertDecimalValue(v any, precision int) any {
       if dec, ok := v.(Decimal); ok {
           fixedSize := internal.DecimalRequiredBytes(precision)
           bytes, err := DecimalLiteral(dec).MarshalBinary()
           if err != nil {
               return v
           }
   
           return padOrTruncateBytes(bytes, fixedSize)
       }
   
       return v
   }
   ```
   
   The call site in the partition value serialization loop already has access 
to the
   `PartitionField` → `DecimalType` → `Precision`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to