fx19880617 opened a new pull request #6012:
URL: https://github.com/apache/incubator-pinot/pull/6012


   ## Description
   Adding field 'segmentCreationJobParallelism' to allow users to set segment 
generation job parallelism. Default to the number of input files.
   
   This can avoid issue of spark job submission timeout. 
   Sample error logs/stacktraces:
   ```
   20/09/12 20:13:02 {} INFO org.apache.pinot.plugin.filesystem.S3PinotFS: 
Listed 40000 files from URI: s3://my-s3-bucket/, is recursive: true
   20/09/12 20:14:31 {} ERROR org.apache.spark.deploy.yarn.ApplicationMaster: 
Uncaught exception: 
   java.util.concurrent.TimeoutException: Futures timed out after [100000 
milliseconds]
        at 
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:259) 
~[scala-library-2.12.10.jar:?]
        at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263) 
~[scala-library-2.12.10.jar:?]
        at 
org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220) 
~[org_apache_spark_spark_shaded_distro_2_12.jar:?]
        at 
org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:469)
 ~[org_apache_spark_spark_shaded_distro_2_12.jar:?]
        at 
org.apache.spark.deploy.yarn.ApplicationMaster.runImpl(ApplicationMaster.scala:305)
 ~[org_apache_spark_spark_shaded_distro_2_12.jar:?]
        at 
org.apache.spark.deploy.yarn.ApplicationMaster.$anonfun$run$1(ApplicationMaster.scala:245)
 ~[org_apache_spark_spark_shaded_distro_2_12.jar:?]
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 
[scala-library-2.12.10.jar:?]
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
 [org_apache_spark_spark_shaded_distro_2_12.jar:?]
        at java.security.AccessController.doPrivileged(Native Method) 
~[?:1.8.0_265]
        at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_265]
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
 [hadoop-common-3.2.1.jar:?]
        at 
org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
 [org_apache_spark_spark_shaded_distro_2_12.jar:?]
        at 
org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:245) 
[org_apache_spark_spark_shaded_distro_2_12.jar:?]
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
 [org_apache_spark_spark_shaded_distro_2_12.jar:?]
        at 
org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) 
[org_apache_spark_spark_shaded_distro_2_12.jar:?]
   20/09/12 20:14:31 {} INFO org.apache.spark.deploy.yarn.ApplicationMaster: 
Final app status: FAILED, exitCode: 13, (reason: Uncaught exception: 
java.util.concurrent.TimeoutException: Futures timed out after [100000 
milliseconds]
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:259)
        at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:263)
        at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:469)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster.runImpl(ApplicationMaster.scala:305)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster.$anonfun$run$1(ApplicationMaster.scala:245)
        at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:779)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:778)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:245)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:803)
        at 
org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
   )
   20/09/12 20:14:31 {} INFO org.apache.spark.deploy.yarn.ApplicationMaster: 
Deleting staging directory 
hdfs://hadoop/user/user1/.sparkStaging/application_1596183113611_11111
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to