HonahX opened a new issue, #14:
URL: https://github.com/apache/iceberg-python/issues/14

   ### Apache Iceberg version
   
   main (development)
   
   ### Please describe the bug 🐞
   
   ## DecimalType
   Currently, if we use `AvroOutputFile` to write decimal values to an avro 
file, sometimes the result file cannot be successfully read by other avro 
reader such as `fastavro`. 
   
   The cause is that when encoding and writing decimal value, we obey the 
iceberg specification and write it as a fixed. However, in the avro file's 
schema, the current implementation specify it as a variable-length binary:
   
https://github.com/apache/iceberg-python/blob/553695eab40a43f3216d127c03ea1bfda26b935f/pyiceberg/utils/schema_conversion.py#L569-L572
   I think this should be changed to
   ```python
       def visit_decimal(self, decimal_type: DecimalType) -> AvroType:
           return {"type": "fixed", 
                        "size": decimal_required_bytes(decimal_type.precision), 
                        "logicalType": "decimal", 
                        "precision": decimal_type.precision, 
                        "scale": decimal_type.scale, 
                        
"name":f"decimal_{decimal_type.precision}_{decimal_type.scale}"
   }
   ```
   So that other avro reader can correctly interpret the encoded value as 
fixed-length bytes instead of trying to read the length.
   
   I think this is also the root cause of the failure I observed when using 
ManifestWriter to write manifest entry for table partitioned by decimalType 
col. I ran some local test and verified that the above change could fix this 
issue. 
   
   ## FixedType and UUIDType
   For Fixed and UUID, I think the current conversion miss the required `name` 
field: https://avro.apache.org/docs/1.11.1/specification/#fixed
   
https://github.com/apache/iceberg-python/blob/553695eab40a43f3216d127c03ea1bfda26b935f/pyiceberg/utils/schema_conversion.py#L567-L568
   
https://github.com/apache/iceberg-python/blob/553695eab40a43f3216d127c03ea1bfda26b935f/pyiceberg/utils/schema_conversion.py#L605-L606
   
   The fastavro will complain
   ```python
   fastavro._schema_common.SchemaParseException: "name" is a required field 
missing from the schema: {'type': 'fixed', 'size': 16}
   ```
   
   Once we fix these, I think the ManifestWriter should work with all types of 
partition values
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to