Re: [PR] Kafka-connect: Make required columns optional if missing from record [iceberg]

via GitHub Thu, 06 Jun 2024 10:50:41 -0700


fqaiser94 commented on code in PR #10456:
URL: https://github.com/apache/iceberg/pull/10456#discussion_r1629982369



##########
kafka-connect/kafka-connect/src/test/java/org/apache/iceberg/connect/data/RecordConverterTest.java:
##########
@@ -859,6 +859,107 @@ public void testEvolveTypeDetectionStructNested() {
     assertThat(updateMap.get("st.ff").type()).isInstanceOf(DoubleType.class);
   }
 
+  @Test
+  public void testDeletedColumnDetectionStruct() {
+    org.apache.iceberg.Schema tableSchema =
+        new org.apache.iceberg.Schema(
+            NestedField.required(1, "i", IntegerType.get()),
+            NestedField.required(2, "j", IntegerType.get()),
+            NestedField.optional(3, "k", IntegerType.get()));
+
+    Table table = mock(Table.class);
+    when(table.schema()).thenReturn(tableSchema);
+    RecordConverter converter = new RecordConverter(table, config);
+
+    Schema connectSchema = SchemaBuilder.struct().field("i", 
Schema.INT32_SCHEMA);
+    Struct data = new Struct(connectSchema).put("i", 1);
+    SchemaUpdate.Consumer consumer = new SchemaUpdate.Consumer();
+    converter.convert(data, consumer);
+    Collection<SchemaUpdate.MakeOptional> makeOptionals = 
consumer.makeOptionals();
+
+    assertThat(makeOptionals).hasSize(1);
+    assertThat(makeOptionals.iterator().next().name()).isEqualTo("j");
+  }

Review Comment:
   @bryanck while investigating a related 
[issue](https://github.com/tabular-io/iceberg-kafka-connect/issues/261) 
reported by a user of the original tabular connector, it occurred to me that 
maybe we should mark columns missing from the record as optional in the table. 
   
   Currently, we only evolve a field to optional if the field is explicitly 
marked as optional in the kafka-connect-schema. But for some 
serialization/deserialization processes, it's possible that the optional fields 
might not be included in the kafka-connect-schema at all for records missing 
that field. 
   
   Currently (i.e. without this change) the connector will simply throw an 
exception at the `IcebergWriter.write` stage (when it tries to write a null 
value into a non-null field) and be stuck until someone manually fixes the 
table schema outside of kafka-connect. 
   
   Not sure if this was an explicit design decision on your part or not but 
wanted to raise it up for discussion regardless. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Kafka-connect: Make required columns optional if missing from record [iceberg]

Reply via email to