Re: [PR] Flink: If IcebergSink writeParallelism is not specified, defaults to the input source parallelism [iceberg]

via GitHub Thu, 12 Jun 2025 16:07:04 -0700


stevenzwu commented on code in PR #13260:
URL: https://github.com/apache/iceberg/pull/13260#discussion_r2143792082



##########
flink/v2.0/flink/src/test/java/org/apache/iceberg/flink/sink/TestIcebergSink.java:
##########
@@ -388,6 +388,43 @@ void testErrorOnNullForRequiredField() throws Exception {
     assertThatThrownBy(() -> 
env.execute()).hasRootCauseInstanceOf(NullPointerException.class);
   }
 
+  @TestTemplate
+  void testDefaultWriteParallelism() {
+    List<Row> rows = createRows("");
+    DataStream<Row> dataStream =
+        env.addSource(createBoundedSource(rows), 
ROW_TYPE_INFO).uid("mySourceId");
+
+    var sink =
+        IcebergSink.forRow(dataStream, SimpleDataUtil.FLINK_SCHEMA)
+            .table(table)
+            .tableLoader(tableLoader)
+            .tableSchema(SimpleDataUtil.FLINK_SCHEMA)
+            .distributionMode(DistributionMode.NONE)
+            .append();
+
+    // since the sink write parallelism was null, it asserts that the default 
parallelism used was
+    // the input source parallelism
+    
assertThat(sink.getTransformation().getParallelism()).isEqualTo(dataStream.getParallelism());

Review Comment:
    sink has multi-stage DAG. does `sink.getTransformation` get the writer 
operator?



##########
flink/v2.0/flink/src/test/java/org/apache/iceberg/flink/sink/TestIcebergSink.java:
##########
@@ -388,6 +388,43 @@ void testErrorOnNullForRequiredField() throws Exception {
     assertThatThrownBy(() -> 
env.execute()).hasRootCauseInstanceOf(NullPointerException.class);
   }
 
+  @TestTemplate
+  void testDefaultWriteParallelism() {
+    List<Row> rows = createRows("");
+    DataStream<Row> dataStream =
+        env.addSource(createBoundedSource(rows), 
ROW_TYPE_INFO).uid("mySourceId");
+
+    var sink =
+        IcebergSink.forRow(dataStream, SimpleDataUtil.FLINK_SCHEMA)
+            .table(table)
+            .tableLoader(tableLoader)
+            .tableSchema(SimpleDataUtil.FLINK_SCHEMA)
+            .distributionMode(DistributionMode.NONE)
+            .append();
+
+    // since the sink write parallelism was null, it asserts that the default 
parallelism used was
+    // the input source parallelism
+    
assertThat(sink.getTransformation().getParallelism()).isEqualTo(dataStream.getParallelism());
+  }
+
+  @TestTemplate
+  void testWriteParallelism() {
+    List<Row> rows = createRows("");
+    DataStream<Row> dataStream =
+        env.addSource(createBoundedSource(rows), 
ROW_TYPE_INFO).uid("mySourceId");
+
+    var sink =
+        IcebergSink.forRow(dataStream, SimpleDataUtil.FLINK_SCHEMA)
+            .table(table)
+            .tableLoader(tableLoader)
+            .tableSchema(SimpleDataUtil.FLINK_SCHEMA)
+            .distributionMode(DistributionMode.NONE)
+            .writeParallelism(parallelism)

Review Comment:
   this parallelism could be the same as the input stream parallelism. we need 
to set the parallelism to be differnt as the input stream parallelism



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Flink: If IcebergSink writeParallelism is not specified, defaults to the input source parallelism [iceberg]

Reply via email to