Guosmilesmile opened a new pull request, #12703:
URL: https://github.com/apache/iceberg/pull/12703

   When we enable range mode and set range-distribution-statistics-type=Sketch, 
a NullPointerException occurs if the number of sort keys is less than the 
downstream parallelism. In the SketchUtil.rangeBounds method, the SortKey[] 
array is initialized with null values. If the number of sort keys during a 
checkpoint duration is less than the downstream parallelism, this method will 
pass a SortKey[] array containing null values, which leads to a 
NullPointerException during serialization.
   
   
   
   the error log like this:
   
   ```
   org.apache.flink.util.FlinkException: Global failure triggered by 
OperatorCoordinator for 'Source: source -> flat -> deliver -> 
*anonymous_datastream_source$1*[1] -> Calc[2] -> range-shuffle' (operator 
7016fb5b1ab54e21d55257a88ab9a39e).
     at 
org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder$LazyInitializedCoordinatorContext.failJob(OperatorCoordinatorHolder.java:651)
 ~[flink-dist-1.19.1.jar:1.19.1]
     at 
org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator$QuiesceableContext.failJob(RecreateOnResetOperatorCoordinator.java:248)
 ~[flink-dist-1.19.1.jar:1.19.1]
     at 
org.apache.iceberg.flink.sink.shuffle.DataStatisticsCoordinator.lambda$runInCoordinatorThread$2(DataStatisticsCoordinator.java:192)
 ~[flink-onejar-4.0.3.jar:?]
     at 
org.apache.flink.util.ThrowableCatchingRunnable.run(ThrowableCatchingRunnable.java:40)
 ~[flink-dist-1.19.1.jar:1.19.1]
     at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
~[?:?]
     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
~[?:?]
     at java.lang.Thread.run(Thread.java:829) ~[?:?]
   Caused by: java.lang.NullPointerException
     at 
org.apache.iceberg.flink.sink.shuffle.SortKeySerializer.serialize(SortKeySerializer.java:146)
 ~[flink-onejar-4.0.3.jar:?]
     at 
org.apache.iceberg.flink.sink.shuffle.SortKeySerializer.serialize(SortKeySerializer.java:52)
 ~[flink-onejar-4.0.3.jar:?]
     at 
org.apache.flink.api.common.typeutils.base.ListSerializer.serialize(ListSerializer.java:126)
 ~[flink-dist-1.19.1.jar:1.19.1]
     at 
org.apache.iceberg.flink.sink.shuffle.GlobalStatisticsSerializer.serialize(GlobalStatisticsSerializer.java:106)
 ~[flink-onejar-4.0.3.jar:?]
     at 
org.apache.iceberg.flink.sink.shuffle.GlobalStatisticsSerializer.serialize(GlobalStatisticsSerializer.java:40)
 ~[flink-onejar-4.0.3.jar:?]
     at 
org.apache.iceberg.flink.sink.shuffle.StatisticsUtil.serializeGlobalStatistics(StatisticsUtil.java:106)
 ~[flink-onejar-4.0.3.jar:?]
     at 
org.apache.iceberg.flink.sink.shuffle.StatisticsEvent.createGlobalStatisticsEvent(StatisticsEvent.java:61)
 ~[flink-onejar-4.0.3.jar:?]
     at 
org.apache.iceberg.flink.sink.shuffle.DataStatisticsCoordinator.lambda$sendGlobalStatisticsToSubtasks$3(DataStatisticsCoordinator.java:259)
 ~[flink-onejar-4.0.3.jar:?]
     at 
org.apache.iceberg.flink.sink.shuffle.DataStatisticsCoordinator.lambda$runInCoordinatorThread$2(DataStatisticsCoordinator.java:184)
 ~[flink-onejar-4.0.3.jar:?]
     ... 4 more
   
   ```
   
   This PR primarily modifies the rangeBounds method to create a SortKey[] 
array without null values.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to