Re: [PR] Optimise RowData evolution [iceberg]

via GitHub Wed, 18 Jun 2025 22:53:33 -0700


aiborodin commented on code in PR #13340:
URL: https://github.com/apache/iceberg/pull/13340#discussion_r2156145524



##########
flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/DynamicRecordProcessor.java:
##########
@@ -142,10 +151,18 @@ private void emit(
       Schema schema,
       CompareSchemasVisitor.Result result,
       PartitionSpec spec) {
-    RowData rowData =
-        result == CompareSchemasVisitor.Result.SAME
-            ? data.rowData()
-            : RowDataEvolver.convert(data.rowData(), data.schema(), schema);
+    RowData rowData;
+    if (result == CompareSchemasVisitor.Result.SAME) {
+      rowData = data.rowData();
+    } else {
+      RowDataConverter rowDataConverter =
+          converterCache.get(
+              data.schema(),
+              dataSchema ->
+                  new RowDataConverter(
+                      FlinkSchemaUtil.convert(dataSchema), 
FlinkSchemaUtil.convert(schema)));

Review Comment:
   Yes, we did, please see the profile attached. According to the profile, 
Schema -> RowType conversion takes approximately 51% of our converter's CPU 
time, while the static conversion in RowDataEvolver, which recomputes field 
indices for every record, accounts for about 45%. It is clear from the profile 
that caching schemas alone wouldn't be sufficient, and we also need quasi-code 
generation.
   
   <img width="1572" alt="Screenshot 2025-06-19 at 3 45 17 pm" 
src="https://github.com/user-attachments/assets/baf27122-9520-44dc-a0be-92659bc8dbff";
 />



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Optimise RowData evolution [iceberg]

Reply via email to