Russell Alexander Spitzer created TINKERPOP-1271:
----------------------------------------------------
Summary: SparkContext should be restarted if Killed and using
Persistent Context
Key: TINKERPOP-1271
URL: https://issues.apache.org/jira/browse/TINKERPOP-1271
Project: TinkerPop
Issue Type: Bug
Components: hadoop
Affects Versions: 3.1.2-incubating, 3.2.0-incubating
Reporter: Russell Alexander Spitzer
If the persisted Spark Context is killed by the user via the Spark UI or is
terminated for some other error the Gremlin Console/Server is left with a
stopped Spark Context. This could be caught and the spark context recreated.
Oddly enough if you simply wait the context will "reset" itself or possible get
GC'd out of the system and everything works again.
##Repo
{code}
gremlin> g.V().count()
WARN org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer -
HADOOP_GREMLIN_LIBS is not set -- proceeding regardless
==>6
gremlin> ERROR org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend
- Application has been killed. Reason: Master removed our application: KILLED
ERROR org.apache.spark.scheduler.TaskSchedulerImpl - Lost executor 0 on
10.150.0.180: Remote RPC client disassociated. Likely due to containers
exceeding thresholds, or network issues. Check driver logs for WARN messages.
// Driver has been killed here via the Master UI
gremlin> graph = GraphFactory.open('conf/hadoop/hadoop-gryo.properties')
==>hadoopgraph[gryoinputformat->gryooutputformat]
gremlin> g.V().count()
WARN org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer -
HADOOP_GREMLIN_LIBS is not set -- proceeding regardless
java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext.
This stopped SparkContext was created at:
org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:53)
org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorage.open(SparkContextStorage.java:60)
org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:122)
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
The currently active SparkContext was created at:
org.apache.spark.SparkContext.getOrCreate(SparkContext.scala)
org.apache.tinkerpop.gremlin.spark.structure.Spark.create(Spark.java:53)
org.apache.tinkerpop.gremlin.spark.structure.io.SparkContextStorage.open(SparkContextStorage.java:60)
org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:122)
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
{code}
Full trace from TP
{code}
at
org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:106)
at
org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1130)
at
org.apache.spark.SparkContext$$anonfun$newAPIHadoopRDD$1.apply(SparkContext.scala:1129)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
at
org.apache.spark.SparkContext.newAPIHadoopRDD(SparkContext.scala:1129)
at
org.apache.spark.api.java.JavaSparkContext.newAPIHadoopRDD(JavaSparkContext.scala:507)
at
org.apache.tinkerpop.gremlin.spark.structure.io.InputFormatRDD.readGraphRDD(InputFormatRDD.java:42)
at
org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer.lambda$submitWithExecutor$1(SparkGraphComputer.java:195)
at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
{code}
If we wait a certain amount of time for some reason everything starts working
again
{code}
ERROR org.apache.spark.rpc.netty.Inbox - Ignoring error
org.apache.spark.SparkException: Exiting due to error from cluster scheduler:
Master removed our application: KILLED
at
org.apache.spark.scheduler.TaskSchedulerImpl.error(TaskSchedulerImpl.scala:438)
at
org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.dead(SparkDeploySchedulerBackend.scala:124)
at
org.apache.spark.deploy.client.AppClient$ClientEndpoint.markDead(AppClient.scala:264)
at
org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$receive$1.applyOrElse(AppClient.scala:172)
at
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
WARN org.apache.spark.rpc.netty.NettyRpcEnv - Ignored message: true
WARN org.apache.spark.deploy.client.AppClient$ClientEndpoint - Connection to
rspitzer-rmbp15.local:7077 failed; waiting for master to reconnect...
WARN org.apache.spark.deploy.client.AppClient$ClientEndpoint - Connection to
rspitzer-rmbp15.local:7077 failed; waiting for master to reconnect...
gremlin> g.V().count()
WARN org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer -
HADOOP_GREMLIN_LIBS is not set -- proceeding regardless
==>6
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)