fqaiser94 commented on code in PR #10456: URL: https://github.com/apache/iceberg/pull/10456#discussion_r1629982369
########## kafka-connect/kafka-connect/src/test/java/org/apache/iceberg/connect/data/RecordConverterTest.java: ########## @@ -859,6 +859,107 @@ public void testEvolveTypeDetectionStructNested() { assertThat(updateMap.get("st.ff").type()).isInstanceOf(DoubleType.class); } + @Test + public void testDeletedColumnDetectionStruct() { + org.apache.iceberg.Schema tableSchema = + new org.apache.iceberg.Schema( + NestedField.required(1, "i", IntegerType.get()), + NestedField.required(2, "j", IntegerType.get()), + NestedField.optional(3, "k", IntegerType.get())); + + Table table = mock(Table.class); + when(table.schema()).thenReturn(tableSchema); + RecordConverter converter = new RecordConverter(table, config); + + Schema connectSchema = SchemaBuilder.struct().field("i", Schema.INT32_SCHEMA); + Struct data = new Struct(connectSchema).put("i", 1); + SchemaUpdate.Consumer consumer = new SchemaUpdate.Consumer(); + converter.convert(data, consumer); + Collection<SchemaUpdate.MakeOptional> makeOptionals = consumer.makeOptionals(); + + assertThat(makeOptionals).hasSize(1); + assertThat(makeOptionals.iterator().next().name()).isEqualTo("j"); + } Review Comment: @bryanck while investigating a related [issue](https://github.com/tabular-io/iceberg-kafka-connect/issues/261) reported by a user of the original tabular connector, it occurred to me that maybe we should mark columns missing from the record as optional in the table. Currently, we only evolve a field to optional if the field is explicitly marked as optional in the kafka-connect-schema. But for some serialization/deserialization processes, it's possible that the optional fields might not be included in the kafka-connect-schema at all for records missing that field. For such records, we currently (i.e. without this change) will throw an exception at the `IcebergWriter.write` stage (when it tries to write a null value into a non-null field) and be stuck until someone manually fixes the table schema outside of kafka-connect. Not sure if this was an explicit design decision on your part or not but wanted to raise it up for discussion regardless. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org