fx19880617 opened a new pull request #6012: URL: https://github.com/apache/incubator-pinot/pull/6012
## Description Adding field 'segmentCreationJobParallelism' to allow users to set segment generation job parallelism. Default to the number of input files. This can avoid issue of spark job submission timeout. Sample error logs/stacktraces: ``` 20/09/12 20:13:02 {} INFO org.apache.pinot.plugin.filesystem.S3PinotFS: Listed 40000 files from URI: s3://my-s3-bucket/, is recursive: true 20/09/12 20:14:31 {} ERROR org.apache.spark.deploy.yarn.ApplicationMaster: Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:259) ~[scala-library-2.12.10.jar:?] at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263) ~[scala-library-2.12.10.jar:?] at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220) ~[org_apache_spark_spark_shaded_distro_2_12.jar:?] at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:469) ~[org_apache_spark_spark_shaded_distro_2_12.jar:?] at org.apache.spark.deploy.yarn.ApplicationMaster.runImpl(ApplicationMaster.scala:305) ~[org_apache_spark_spark_shaded_distro_2_12.jar:?] at org.apache.spark.deploy.yarn.ApplicationMaster.$anonfun$run$1(ApplicationMaster.scala:245) ~[org_apache_spark_spark_shaded_distro_2_12.jar:?] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) [scala-library-2.12.10.jar:?] at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779) [org_apache_spark_spark_shaded_distro_2_12.jar:?] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_265] at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_265] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) [hadoop-common-3.2.1.jar:?] at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778) [org_apache_spark_spark_shaded_distro_2_12.jar:?] at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:245) [org_apache_spark_spark_shaded_distro_2_12.jar:?] at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803) [org_apache_spark_spark_shaded_distro_2_12.jar:?] at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) [org_apache_spark_spark_shaded_distro_2_12.jar:?] 20/09/12 20:14:31 {} INFO org.apache.spark.deploy.yarn.ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason: Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:259) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263) at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220) at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:469) at org.apache.spark.deploy.yarn.ApplicationMaster.runImpl(ApplicationMaster.scala:305) at org.apache.spark.deploy.yarn.ApplicationMaster.$anonfun$run$1(ApplicationMaster.scala:245) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:245) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) ) 20/09/12 20:14:31 {} INFO org.apache.spark.deploy.yarn.ApplicationMaster: Deleting staging directory hdfs://hadoop/user/user1/.sparkStaging/application_1596183113611_11111 ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org