thswlsqls opened a new issue, #17076:
URL: https://github.com/apache/iceberg/issues/17076

   **Apache Iceberg version**
   main @ 49b89a8c5
   
   **Query engine**
   Kafka Connect (N/A — not Spark/Flink)
   
   **Please describe the bug**
   `RecordConverter.convertUUID()` 
(`kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/RecordConverter.java`
 line 508) converts UUID values to `byte[]` when the target file format is 
Parquet. The Parquet UUID writer, `ParquetValueWriters.uuids()` (via 
`BaseParquetWriter`'s default visitor), expects a `java.util.UUID` and converts 
to bytes internally, so a `byte[]` fails to cast during write. ORC 
(`GenericOrcWriters.uuids()`) and Avro both write `UUID` directly — only the 
Parquet branch in `convertUUID` diverges from this.
   
   **Steps to reproduce**
   Sink a record with a UUID field to an Iceberg table using the default file 
format (Parquet). `IcebergWriter.convertToRow()` calls `convertUUID()`, which 
returns `byte[]`; the Parquet writer stack then throws on write.
   Expected: the record writes successfully.
   Actual: `ClassCastException: class [B cannot be cast to class 
java.util.UUID`.
   
   **Additional context**
   The `byte[]` conversion matched the Parquet writer contract before PR #11904 
("Parquet: Add readers and writers for the internal object model", merged 
2025-01-24) changed `ParquetValueWriters`' UUID writer to accept `UUID` 
directly. `kafka-connect` was never updated to follow, making this a regression 
from that writer-contract change.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to