[
https://issues.apache.org/jira/browse/TINKERPOP-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marko A. Rodriguez reopened TINKERPOP-1163:
-------------------------------------------
> GraphComputer's can have TraversalStrategies.
> ---------------------------------------------
>
> Key: TINKERPOP-1163
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1163
> Project: TinkerPop
> Issue Type: Improvement
> Components: hadoop, process
> Affects Versions: 3.1.0-incubating
> Reporter: Marko A. Rodriguez
> Assignee: Marko A. Rodriguez
> Fix For: 3.2.0-incubating
>
>
> @dkuppitz makes the joke that he can count the number of vertices in the
> Friendster adjacency list with "awk to the sed to the bash to the.." in < 1
> minute. SparkGraphComputer with four blades takes ~5 minutes.
> What's the dealio?
> Imagine a world where {{SparkGraphComputerStrategy}} exists. It analyzes
> traversals and does fast executions breaking away from the VertexProgram API
> and going strait to the native API of Spark. Check it:
> {code}
> g.V().count() -> inputRDD.count()
> {code}
> ...add a {{EmptyVertex.instance()}} manipulation to the respective
> InputFormats and you are just then skipping through bytes not manifesting
> objects at all. BAM. That would take 30 seconds on Friendster.
> {code}
> g.V().outE('knows').count() -->
> inputRDD.flatMapToPair{edgeComponents}.filter{knows}.count()
> {code}
> Blazing fast.
> ....for all those standard patterns, we just do a "native" execution for the
> respective GraphComputer engine. We sideStep object creation, iteration
> phases, views, map reduce jobs.... However, we have to be smart to update the
> {{Memory}} so it looks as if the real VertexProgram executed! ---
> {{iteration}}, {{runtime}}, {{~reducing}}, etc.
> Genius.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)