Russell Alexander Spitzer created TINKERPOP-1218:
----------------------------------------------------

             Summary: Usage of toLocalIterator Produces large amount of Spark 
Jobs
                 Key: TINKERPOP-1218
                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1218
             Project: TinkerPop
          Issue Type: Improvement
          Components: hadoop
    Affects Versions: 3.1.1-incubating
            Reporter: Russell Alexander Spitzer


https://github.com/apache/incubator-tinkerpop/blob/master/spark-gremlin/src/main/java/org/apache/tinkerpop/gremlin/spark/structure/io/PersistedOutputRDD.java#L72

Will end up creating a separate Spark Job for every task in the RDD. This will 
overwhelm the UI with un-important information and shouldn't be relevant to 
users attempting diagnostics. Since this RDD is relatively small we should be 
fine switching this line to a `.collect` call which will pull the entire RDD 
down to the driver in 1 Job.

So as long as the total size of this RDD is on the scale of megabytes we can 
make a readable user interface with

{code}
        return IteratorUtils.map(memoryRDD.collect().iterator(), tuple -> new 
KeyValue<>(tuple._1(), tuple._2()));
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to