laskoviymishka commented on code in PR #16826:
URL: https://github.com/apache/iceberg/pull/16826#discussion_r3442536156
##########
kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/RecordConverter.java:
##########
@@ -268,6 +276,74 @@ private GenericRecord convertToStruct(
return result;
}
+ /**
+ * This method evolves schemas when the value is null. Note: logic should be
kept consistent with
Review Comment:
This "kept consistent" claim isn't actually true, and that's the part that
worries me most.
The non-null `convertToStruct` stops at the first level where it emits an
update: once `hasSchemaUpdates` is true it skips `convertValue` entirely, so it
never walks the nested children in that pass. Here, after emitting an
`updateType` or `makeOptional`, we keep recursing into the subtree in the same
pass — so the null path can emit events at multiple nesting levels at once
where the non-null path emits one and defers.
`testSchemaEvolutionForFieldAndNestedFields` exercises exactly this.
It's not incorrect — `UpdateSchema` is atomic, so the end state is the same
— but the comment asserts a parity that doesn't hold, and that's the kind of
false invariant a future change will lean on. I'd either make the recursion
mirror the non-null path (stop descending once an update fires at this level),
or rewrite the Javadoc to describe what this actually does, including the
precondition that `schemaUpdateConsumer != null` and that it emits add-column /
type-promotion / make-optional. wdyt?
##########
kafka-connect/kafka-connect/src/test/java/org/apache/iceberg/connect/data/TestRecordConverter.java:
##########
@@ -859,6 +859,378 @@ private void assertTypesAddedFromStruct(Function<String,
Type> fn) {
assertThat(fn.apply("ma")).isInstanceOf(MapType.class);
}
+ @Test
+ public void testNestedSchemaEvolutionStructWithNullValue() {
+ org.apache.iceberg.Schema nestedStructSchema =
+ new org.apache.iceberg.Schema(
Review Comment:
There's no test for the most common null-field case: value is null and the
Connect schema matches the table exactly, so nothing should be emitted. That's
the path that runs on the majority of sparse records, and the one a
false-positive regression (spurious `addColumn`/`updateType`) would hit first.
I'd add one with a table like `{id:int, nested:struct<a:int>}`, a matching
Connect schema, the nested value null, and assert `addColumns()`,
`updateTypes()`, and `makeOptionals()` are all empty and
`result.getField("nested")` is null. Without it, an over-eager emission on null
values passes silently.
##########
kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/RecordConverter.java:
##########
@@ -268,6 +276,74 @@ private GenericRecord convertToStruct(
return result;
}
+ /**
+ * This method evolves schemas when the value is null. Note: logic should be
kept consistent with
+ * equivalent method used when value is not null: {@link
#convertToStruct(Struct, StructType, int,
+ * SchemaUpdate.Consumer)}
+ */
+ private void evolveSchemaFromConnectSchema(
+ org.apache.kafka.connect.data.Schema recordSchema,
+ Type tableType,
+ int tableFieldId,
+ SchemaUpdate.Consumer schemaUpdateConsumer) {
+ if (recordSchema == null) {
+ return;
+ }
+ switch (recordSchema.type()) {
+ case STRUCT:
Review Comment:
Each container branch guards its body with `isStructType()` / `isListType()`
/ `isMapType()`, so if the Connect schema says STRUCT but the table column is a
STRING, the whole body is skipped — no evolution, no exception, nothing logged.
The non-null path would eventually throw on that mismatch; here it silently
succeeds and the column is written null, so an operator gets schema drift with
no signal.
I'd add a `LOG.warn` in the else of each type guard naming the field and the
two types. Silent is the wrong default here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]