Russell Alexander Spitzer created TINKERPOP-1218:
----------------------------------------------------
Summary: Usage of toLocalIterator Produces large amount of Spark
Jobs
Key: TINKERPOP-1218
URL: https://issues.apache.org/jira/browse/TINKERPOP-1218
Project: TinkerPop
Issue Type: Improvement
Components: hadoop
Affects Versions: 3.1.1-incubating
Reporter: Russell Alexander Spitzer
https://github.com/apache/incubator-tinkerpop/blob/master/spark-gremlin/src/main/java/org/apache/tinkerpop/gremlin/spark/structure/io/PersistedOutputRDD.java#L72
Will end up creating a separate Spark Job for every task in the RDD. This will
overwhelm the UI with un-important information and shouldn't be relevant to
users attempting diagnostics. Since this RDD is relatively small we should be
fine switching this line to a `.collect` call which will pull the entire RDD
down to the driver in 1 Job.
So as long as the total size of this RDD is on the scale of megabytes we can
make a readable user interface with
{code}
return IteratorUtils.map(memoryRDD.collect().iterator(), tuple -> new
KeyValue<>(tuple._1(), tuple._2()));
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)