stevenzwu commented on code in PR #13260: URL: https://github.com/apache/iceberg/pull/13260#discussion_r2143792082
########## flink/v2.0/flink/src/test/java/org/apache/iceberg/flink/sink/TestIcebergSink.java: ########## @@ -388,6 +388,43 @@ void testErrorOnNullForRequiredField() throws Exception { assertThatThrownBy(() -> env.execute()).hasRootCauseInstanceOf(NullPointerException.class); } + @TestTemplate + void testDefaultWriteParallelism() { + List<Row> rows = createRows(""); + DataStream<Row> dataStream = + env.addSource(createBoundedSource(rows), ROW_TYPE_INFO).uid("mySourceId"); + + var sink = + IcebergSink.forRow(dataStream, SimpleDataUtil.FLINK_SCHEMA) + .table(table) + .tableLoader(tableLoader) + .tableSchema(SimpleDataUtil.FLINK_SCHEMA) + .distributionMode(DistributionMode.NONE) + .append(); + + // since the sink write parallelism was null, it asserts that the default parallelism used was + // the input source parallelism + assertThat(sink.getTransformation().getParallelism()).isEqualTo(dataStream.getParallelism()); Review Comment: sink has multi-stage DAG. does `sink.getTransformation` get the writer operator? ########## flink/v2.0/flink/src/test/java/org/apache/iceberg/flink/sink/TestIcebergSink.java: ########## @@ -388,6 +388,43 @@ void testErrorOnNullForRequiredField() throws Exception { assertThatThrownBy(() -> env.execute()).hasRootCauseInstanceOf(NullPointerException.class); } + @TestTemplate + void testDefaultWriteParallelism() { + List<Row> rows = createRows(""); + DataStream<Row> dataStream = + env.addSource(createBoundedSource(rows), ROW_TYPE_INFO).uid("mySourceId"); + + var sink = + IcebergSink.forRow(dataStream, SimpleDataUtil.FLINK_SCHEMA) + .table(table) + .tableLoader(tableLoader) + .tableSchema(SimpleDataUtil.FLINK_SCHEMA) + .distributionMode(DistributionMode.NONE) + .append(); + + // since the sink write parallelism was null, it asserts that the default parallelism used was + // the input source parallelism + assertThat(sink.getTransformation().getParallelism()).isEqualTo(dataStream.getParallelism()); + } + + @TestTemplate + void testWriteParallelism() { + List<Row> rows = createRows(""); + DataStream<Row> dataStream = + env.addSource(createBoundedSource(rows), ROW_TYPE_INFO).uid("mySourceId"); + + var sink = + IcebergSink.forRow(dataStream, SimpleDataUtil.FLINK_SCHEMA) + .table(table) + .tableLoader(tableLoader) + .tableSchema(SimpleDataUtil.FLINK_SCHEMA) + .distributionMode(DistributionMode.NONE) + .writeParallelism(parallelism) Review Comment: this parallelism could be the same as the input stream parallelism. we need to set the parallelism to be differnt as the input stream parallelism -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org