Guosmilesmile opened a new pull request, #12703: URL: https://github.com/apache/iceberg/pull/12703
When we enable range mode and set range-distribution-statistics-type=Sketch, a NullPointerException occurs if the number of sort keys is less than the downstream parallelism. In the SketchUtil.rangeBounds method, the SortKey[] array is initialized with null values. If the number of sort keys during a checkpoint duration is less than the downstream parallelism, this method will pass a SortKey[] array containing null values, which leads to a NullPointerException during serialization. the error log like this: ``` org.apache.flink.util.FlinkException: Global failure triggered by OperatorCoordinator for 'Source: source -> flat -> deliver -> *anonymous_datastream_source$1*[1] -> Calc[2] -> range-shuffle' (operator 7016fb5b1ab54e21d55257a88ab9a39e). at org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder$LazyInitializedCoordinatorContext.failJob(OperatorCoordinatorHolder.java:651) ~[flink-dist-1.19.1.jar:1.19.1] at org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator$QuiesceableContext.failJob(RecreateOnResetOperatorCoordinator.java:248) ~[flink-dist-1.19.1.jar:1.19.1] at org.apache.iceberg.flink.sink.shuffle.DataStatisticsCoordinator.lambda$runInCoordinatorThread$2(DataStatisticsCoordinator.java:192) ~[flink-onejar-4.0.3.jar:?] at org.apache.flink.util.ThrowableCatchingRunnable.run(ThrowableCatchingRunnable.java:40) ~[flink-dist-1.19.1.jar:1.19.1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] at java.lang.Thread.run(Thread.java:829) ~[?:?] Caused by: java.lang.NullPointerException at org.apache.iceberg.flink.sink.shuffle.SortKeySerializer.serialize(SortKeySerializer.java:146) ~[flink-onejar-4.0.3.jar:?] at org.apache.iceberg.flink.sink.shuffle.SortKeySerializer.serialize(SortKeySerializer.java:52) ~[flink-onejar-4.0.3.jar:?] at org.apache.flink.api.common.typeutils.base.ListSerializer.serialize(ListSerializer.java:126) ~[flink-dist-1.19.1.jar:1.19.1] at org.apache.iceberg.flink.sink.shuffle.GlobalStatisticsSerializer.serialize(GlobalStatisticsSerializer.java:106) ~[flink-onejar-4.0.3.jar:?] at org.apache.iceberg.flink.sink.shuffle.GlobalStatisticsSerializer.serialize(GlobalStatisticsSerializer.java:40) ~[flink-onejar-4.0.3.jar:?] at org.apache.iceberg.flink.sink.shuffle.StatisticsUtil.serializeGlobalStatistics(StatisticsUtil.java:106) ~[flink-onejar-4.0.3.jar:?] at org.apache.iceberg.flink.sink.shuffle.StatisticsEvent.createGlobalStatisticsEvent(StatisticsEvent.java:61) ~[flink-onejar-4.0.3.jar:?] at org.apache.iceberg.flink.sink.shuffle.DataStatisticsCoordinator.lambda$sendGlobalStatisticsToSubtasks$3(DataStatisticsCoordinator.java:259) ~[flink-onejar-4.0.3.jar:?] at org.apache.iceberg.flink.sink.shuffle.DataStatisticsCoordinator.lambda$runInCoordinatorThread$2(DataStatisticsCoordinator.java:184) ~[flink-onejar-4.0.3.jar:?] ... 4 more ``` This PR primarily modifies the rangeBounds method to create a SortKey[] array without null values. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org