yihua commented on code in PR #18483:
URL: https://github.com/apache/hudi/pull/18483#discussion_r3072663029
##########
hudi-sync/hudi-sync-common/src/main/java/org/apache/hudi/sync/common/util/SparkSchemaUtils.java:
##########
@@ -88,6 +88,7 @@ private static String convertFieldType(HoodieSchema
originalFieldSchema) {
+ ",\"valueContainsNull\":" + valueOptional + "}";
case RECORD:
case BLOB:
+ case VARIANT:
Review Comment:
🤖 BLOB fields tag themselves with `{"hudi_type":"BLOB"}` in the Spark field
metadata so that `HoodieSparkSchemaConverters.toHoodieType()` can reconstruct
the logical type on the way back. VARIANT fields don't get that annotation
here, so after a sync round-trip the VARIANT column would be reconstructed as a
plain `STRUCT` rather than a `VARIANT`. Could you add a similar
`isVariantField()`-style check in `convertToSparkSchemaJson` (the calling
method) and a corresponding reconstruction case in
`HoodieSparkSchemaConverters`?
<sub><i>- Generated by an AI agent and may contain mistakes. Please verify
any suggestions before applying.</i></sub>
##########
hudi-sync/hudi-hive-sync/src/test/java/org/apache/hudi/hive/TestSparkSchemaUtils.java:
##########
@@ -194,4 +194,25 @@ public void testConvertSchemaWithBlobField() {
StructType blobStruct = (StructType) sparkSchema.fields()[1].dataType();
assertEquals(3, blobStruct.fields().length);
}
+
+ @Test
+ public void testConvertSchemaWithVariantField() {
+ HoodieSchema schema = HoodieSchema.createRecord("root", null, null, false,
Arrays.asList(
Review Comment:
🤖 Looking at the diff, nested variants are already covered in
`TestHiveSchemaUtil` (the `nested_variant_field` case with a record containing
a `variant_data` sub-field) and in `TestBigQuerySchemaResolver`
(`convertSchema_nestedVariantField`). The gap is specifically in
`TestSparkSchemaUtils` — the new `testConvertSchemaWithVariantField` only
exercises a top-level variant. A similar test with a struct-containing-variant
would close the parity.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]