[jira] [Closed] (TINKERPOP-1218) Usage of toLocalIterator Produces large amount of Spark Jobs

Marko A. Rodriguez (JIRA) Mon, 14 Mar 2016 08:33:02 -0700

     [ 
https://issues.apache.org/jira/browse/TINKERPOP-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Marko A. Rodriguez closed TINKERPOP-1218.
-----------------------------------------
       Resolution: Fixed
         Assignee: Marko A. Rodriguez
    Fix Version/s: 3.1.2-incubating
                   3.2.0-incubating

This was a simple fix. This is both in tp31/ and master/.

> Usage of toLocalIterator Produces large amount of Spark Jobs
> ------------------------------------------------------------
>
>                 Key: TINKERPOP-1218
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1218
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: hadoop
>    Affects Versions: 3.1.1-incubating
>            Reporter: Russell Alexander Spitzer
>            Assignee: Marko A. Rodriguez
>             Fix For: 3.2.0-incubating, 3.1.2-incubating
>
>
> https://github.com/apache/incubator-tinkerpop/blob/master/spark-gremlin/src/main/java/org/apache/tinkerpop/gremlin/spark/structure/io/PersistedOutputRDD.java#L72
> Will end up creating a separate Spark Job for every task in the RDD. This 
> will overwhelm the UI with un-important information and shouldn't be relevant 
> to users attempting diagnostics. Since this RDD is relatively small we should 
> be fine switching this line to a `.collect` call which will pull the entire 
> RDD down to the driver in 1 Job.
> So as long as the total size of this RDD is on the scale of megabytes we can 
> make a readable user interface with
> {code}
>         return IteratorUtils.map(memoryRDD.collect().iterator(), tuple -> new 
> KeyValue<>(tuple._1(), tuple._2()));
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (TINKERPOP-1218) Usage of toLocalIterator Produces large amount of Spark Jobs

Reply via email to