DinGo4DEV commented on PR #2003:
URL: https://github.com/apache/iceberg-python/pull/2003#issuecomment-2888304797
@Fokko TBR, After running the test case, I found that the identity transform
of uuid is not supported for writing, because the value is bytes. So I tried
rewrite the Avro writer and other related components.
- The uuid is still storing bytes in parquet
- Changed the identity value to hex representation
`ec9b663b-062f-4200-a130-8de19c21b800` instead of bytes string value
`b'\xec\x9bf;\x06/B\x00\xa10\x8d\xe1\x9c!\xb8\x00'`
``` bash
data/uuid_bucket=0/uuid_identity=ec9b663b-062f-4200-a130-8de19c21b800
|- xxxxx.parquet
data/uuid_bucket=1/uuid_identity=5f473c64-dbeb-449b-bdfa-b6b4185b1bde
|- xxxxx.parquet
```
Not sure is that correct and compatible with other integration as I haven't
tried partition the uuid with identity before in other projects.
However if this PR is accepted, we still need to rewrite other test cases
related to UUID.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]