Re: [PR] Kafka-connect: Make required columns optional if missing from record [iceberg]

via GitHub Thu, 06 Jun 2024 10:51:25 -0700


fqaiser94 commented on code in PR #10456:
URL: https://github.com/apache/iceberg/pull/10456#discussion_r1629982369



##########
kafka-connect/kafka-connect/src/test/java/org/apache/iceberg/connect/data/RecordConverterTest.java:
##########
@@ -859,6 +859,107 @@ public void testEvolveTypeDetectionStructNested() {
     assertThat(updateMap.get("st.ff").type()).isInstanceOf(DoubleType.class);
   }
 
+  @Test
+  public void testDeletedColumnDetectionStruct() {
+    org.apache.iceberg.Schema tableSchema =
+        new org.apache.iceberg.Schema(
+            NestedField.required(1, "i", IntegerType.get()),
+            NestedField.required(2, "j", IntegerType.get()),
+            NestedField.optional(3, "k", IntegerType.get()));
+
+    Table table = mock(Table.class);
+    when(table.schema()).thenReturn(tableSchema);
+    RecordConverter converter = new RecordConverter(table, config);
+
+    Schema connectSchema = SchemaBuilder.struct().field("i", 
Schema.INT32_SCHEMA);
+    Struct data = new Struct(connectSchema).put("i", 1);
+    SchemaUpdate.Consumer consumer = new SchemaUpdate.Consumer();
+    converter.convert(data, consumer);
+    Collection<SchemaUpdate.MakeOptional> makeOptionals = 
consumer.makeOptionals();
+
+    assertThat(makeOptionals).hasSize(1);
+    assertThat(makeOptionals.iterator().next().name()).isEqualTo("j");
+  }

Review Comment:
   @bryanck while investigating a related 
[issue](https://github.com/tabular-io/iceberg-kafka-connect/issues/261) 
reported by a user of the original tabular connector, it occurred to me that 
maybe we should mark columns missing from the record as optional in the table. 
   
   Currently, we only evolve a field to optional if the field is explicitly 
marked as optional in the kafka-connect-schema. 
   
   But for some serialization/deserialization processes, it's possible that the 
optional fields might not be included in the kafka-connect-schema at all for 
records missing that field. For such records, we currently (i.e. without this 
change) will throw an exception at the `IcebergWriter.write` stage (when it 
tries to write a null value into a non-null field) and be stuck until someone 
manually fixes the table schema outside of kafka-connect. 
   
   Not sure if this was an explicit design decision on your part or not but 
wanted to raise it up for discussion regardless. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Kafka-connect: Make required columns optional if missing from record [iceberg]

Reply via email to