HonahX opened a new issue, #14: URL: https://github.com/apache/iceberg-python/issues/14
### Apache Iceberg version main (development) ### Please describe the bug 🐞 ## DecimalType Currently, if we use `AvroOutputFile` to write decimal values to an avro file, sometimes the result file cannot be successfully read by other avro reader such as `fastavro`. The cause is that when encoding and writing decimal value, we obey the iceberg specification and write it as a fixed. However, in the avro file's schema, the current implementation specify it as a variable-length binary: https://github.com/apache/iceberg-python/blob/553695eab40a43f3216d127c03ea1bfda26b935f/pyiceberg/utils/schema_conversion.py#L569-L572 I think this should be changed to ```python def visit_decimal(self, decimal_type: DecimalType) -> AvroType: return {"type": "fixed", "size": decimal_required_bytes(decimal_type.precision), "logicalType": "decimal", "precision": decimal_type.precision, "scale": decimal_type.scale, "name":f"decimal_{decimal_type.precision}_{decimal_type.scale}" } ``` So that other avro reader can correctly interpret the encoded value as fixed-length bytes instead of trying to read the length. I think this is also the root cause of the failure I observed when using ManifestWriter to write manifest entry for table partitioned by decimalType col. I ran some local test and verified that the above change could fix this issue. ## FixedType and UUIDType For Fixed and UUID, I think the current conversion miss the required `name` field: https://avro.apache.org/docs/1.11.1/specification/#fixed https://github.com/apache/iceberg-python/blob/553695eab40a43f3216d127c03ea1bfda26b935f/pyiceberg/utils/schema_conversion.py#L567-L568 https://github.com/apache/iceberg-python/blob/553695eab40a43f3216d127c03ea1bfda26b935f/pyiceberg/utils/schema_conversion.py#L605-L606 The fastavro will complain ```python fastavro._schema_common.SchemaParseException: "name" is a required field missing from the schema: {'type': 'fixed', 'size': 16} ``` Once we fix these, I think the ManifestWriter should work with all types of partition values -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
