[
https://issues.apache.org/jira/browse/TINKERPOP-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15145454#comment-15145454
]
ASF GitHub Bot commented on TINKERPOP-1140:
-------------------------------------------
GitHub user okram opened a pull request:
https://github.com/apache/incubator-tinkerpop/pull/227
TINKERPOP-1140: TraversalVertexProgramStep in support of OLAP/OLTP
conversions.
https://issues.apache.org/jira/browse/TINKERPOP-1140
`Traversals` can now be composed of multiple OLAP jobs. This ticket in
particular introduced `TraversalVertexProgramStep` which is responsible for
executing a OLAP job and then sending its results to the next OLAP step
(implementing a `VertexComputing`) or, it just drains the results serially to
the next OLTP step. To demonstrate this working, I added
`PageRankVertexProgramStep` (experimental for now -- a future commit will make
all of this more explicit and include `PeerPressureVertexProgramStep`. This
ticket is particular to `TraversalVertexProgramStep`). Here is the glory.
```
gremlin> g = TinkerFactory.createModern().traversal().withComputer()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6],
tinkergraphcomputer]
gremlin>
g.V().pageRank().order().by(PageRankVertexProgram.PAGE_RANK).valueMap()
==>[gremlin.pageRankVertexProgram.pageRank:[0.15000000000000002],
name:[peter], gremlin.pageRankVertexProgram.edgeCount:[1.0], age:[35]]
==>[gremlin.pageRankVertexProgram.pageRank:[0.15000000000000002],
name:[marko], gremlin.pageRankVertexProgram.edgeCount:[3.0], age:[29]]
==>[gremlin.pageRankVertexProgram.pageRank:[0.19250000000000003],
name:[josh], gremlin.pageRankVertexProgram.edgeCount:[2.0], age:[32]]
==>[gremlin.pageRankVertexProgram.pageRank:[0.19250000000000003],
name:[vadas], gremlin.pageRankVertexProgram.edgeCount:[0.0], age:[27]]
==>[gremlin.pageRankVertexProgram.pageRank:[0.23181250000000003],
name:[ripple], gremlin.pageRankVertexProgram.edgeCount:[0.0], lang:[java]]
==>[gremlin.pageRankVertexProgram.pageRank:[0.4018125], name:[lop],
gremlin.pageRankVertexProgram.edgeCount:[0.0], lang:[java]]
gremlin>
g.V().pageRank().order().by(PageRankVertexProgram.PAGE_RANK).valueMap().explain()
==>Traversal Explanation
========================================================================================================================================================================================================================================
Original Traversal [GraphStep([],vertex),
PageRankVertexProgramStep([VertexStep(OUT,edge)]),
OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))]),
PropertyMapStep(value)]
ConnectiveStrategy [D] [GraphStep([],vertex),
PageRankVertexProgramStep([VertexStep(OUT,edge)]),
OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))]),
PropertyMapStep(value)]
VertexProgramStrategy [D]
[PageRankVertexProgramStep([VertexStep(OUT,edge)]),
TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]),
ComputerResultStep, PropertyMapStep(value)]
IdentityRemovalStrategy [O]
[PageRankVertexProgramStep([VertexStep(OUT,edge)]),
TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]),
ComputerResultStep, PropertyMapStep(value)]
FilterRankingStrategy [O]
[PageRankVertexProgramStep([VertexStep(OUT,edge)]),
TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]),
ComputerResultStep, PropertyMapStep(value)]
IncidentToAdjacentStrategy [O]
[PageRankVertexProgramStep([VertexStep(OUT,edge)]),
TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]),
ComputerResultStep, PropertyMapStep(value)]
AdjacentToIncidentStrategy [O]
[PageRankVertexProgramStep([VertexStep(OUT,edge)]),
TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]),
ComputerResultStep, PropertyMapStep(value)]
MatchPredicateStrategy [O]
[PageRankVertexProgramStep([VertexStep(OUT,edge)]),
TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]),
ComputerResultStep, PropertyMapStep(value)]
RangeByIsCountStrategy [O]
[PageRankVertexProgramStep([VertexStep(OUT,edge)]),
TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]),
ComputerResultStep, PropertyMapStep(value)]
TinkerGraphStepStrategy [P]
[PageRankVertexProgramStep([VertexStep(OUT,edge)]),
TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]),
ComputerResultStep, PropertyMapStep(value)]
ProfileStrategy [F]
[PageRankVertexProgramStep([VertexStep(OUT,edge)]),
TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]),
ComputerResultStep, PropertyMapStep(value)]
StandardVerificationStrategy [V]
[PageRankVertexProgramStep([VertexStep(OUT,edge)]),
TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]),
ComputerResultStep, PropertyMapStep(value)]
ComputerVerificationStrategy [V]
[PageRankVertexProgramStep([VertexStep(OUT,edge)]),
TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]),
ComputerResultStep, PropertyMapStep(value)]
Final Traversal
[PageRankVertexProgramStep([VertexStep(OUT,edge)]),
TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]),
ComputerResultStep, PropertyMapStep(value)]
gremlin>
gremlin>
g.V().pageRank().order().by(PageRankVertexProgram.PAGE_RANK).valueMap().toString()
==>[GraphStep([],vertex),
PageRankVertexProgramStep([VertexStep(OUT,edge)]),
OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))]),
PropertyMapStep(value)]
gremlin>
g.V().pageRank().order().by(PageRankVertexProgram.PAGE_RANK).valueMap().iterate().toString()
==>[PageRankVertexProgramStep([VertexStep(OUT,edge)]),
TraversalVertexProgramStep([OrderGlobalStep([incr(value(gremlin.pageRankVertexProgram.pageRank))])]),
ComputerResultStep, PropertyMapStep(value)]
```
CHANGELOG:
```
* Fixed a bug in both `SparkGraphComputer` and `GiraphGraphComputer`
regarding source data access in job chains.
* Significantly extended job chaining test coverage for `GraphComputer`
providers.
* Added `TraversalHelper.onGraphComputer(traversal)`.
* `MapReduce.map()` no longer has a default implementation. This method
must be implemented.
* `TraversalVertexProgram` can work without a `GraphStep` start.
* Fixed a severe bug in `TraverserMapReduce.combine()`.
* Added `PageRankVertexProgramStep` and `GraphTraversal.pageRank()`.
* Simplified the comparator model in `OrderGlobalStep` and `OrderLocalStep`.
```
The only note for the upgrade docs will be that `MapReduce.map()` no longer
have a default implementation. However, if you didn't implement it, your
`MapReduce` job would instantly fail so I don't think anyone has code like
that. Oh, and `TraversalHelper.onGraphComputer()` will replace the "stub
method" I had in the previous PR with `TraversalStrategies.onGraphComputer()`.
Thus, minor nothings to upgrade docs.
*** NOTE: I will add `pageRank()` to the reference doc in a future 3.2.0
commit when I do another ticket that is more specific to
`PageRankVertexProgramStep` and `PeerPressureVertexProgram`.
VOTE +1. `mvn clean install` and integration tests -- (though Giraph is
currently still running -- but I have isolated and verified the two tests I
added to the `ProcessComputerSuite`).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-1140
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-tinkerpop/pull/227.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #227
----
commit 64cd1b18df1170e96d61c0e68cc6495d32c7fa3e
Author: Marko A. Rodriguez <[email protected]>
Date: 2016-02-09T22:29:31Z
TraversalVertexProgramStep created. It works. Sorta --- need to get
ComputerVerificationStrategy more aware. 95 percent of tests pass -- the ones
that fail have to do with deep nested traverasls and
ComputerVerificationStrategy not catching them. Anywho, once this is solid, the
door is open for PageRankVertexProgramStep. Going to get crazy here soon. Also,
ComputerResultStep now simply traversifies ComputerResults. Really elegant.
commit f2dbb0f2094bb7044e3c1f8317fc2153c9010b2f
Author: Marko A. Rodriguez <[email protected]>
Date: 2016-02-10T18:11:35Z
Have TraversalVertexProgramStep working fully. I had to Ignore tests in
ExplainTest and ProfileTest as OLTP and OLAP now no longer compile to the same
representation. This is both good and bad. Its good because that means that a
Traversal is not globally OLAP or globally OLTP. It can have OLAP and OLTP
sections. Its bad in that features like Profile and Explain need to be smart
about mixed-engine traversals. I am still far from finished ... just want to
push to save work.
commit e5c3d784e2db06febd4da1d079712336818fd0ca
Author: Marko A. Rodriguez <[email protected]>
Date: 2016-02-10T18:59:04Z
fixed a bug around Serialization of GraphComputer. Integration tests now
pass. Cool.
commit 6cbd8a5da0b8dad4f675c586a809e661a902076b
Author: Marko A. Rodriguez <[email protected]>
Date: 2016-02-10T19:36:20Z
Merge branch 'TINKERPOP-971' into TINKERPOP-1140
commit 080863a5837d94417b9f9fc17d0476defab56e63
Author: Marko A. Rodriguez <[email protected]>
Date: 2016-02-10T22:23:02Z
okay. TraversalVertexProgramStep is just like any other step. Nothing
'special'. Its a TraversalParent with one global child -- the traversal to be
executed on the GraphComputer. I got rid of
TraversalSideEffects.onGraphComputer() as again, a traversal can now have OLTP
and OLAP components. In its place, we have
TraversalHelper.onGraphComputer(traversal) which will tell you if a parent of
the traversal is a TraversalVertexProgramStep. this model is really really nice
as it all falls within the natural construts of our Traversal API -- no new
interfaces needed, no weird two sets of strategies, no weird two sets of
sideEffects... Its really clean.
commit 3afc1894a4805e82ad076535697add3baac91c6c
Author: Marko A. Rodriguez <[email protected]>
Date: 2016-02-11T01:41:07Z
Stubbed PageRankVertexProgramStep. Cleaned up and optimized a few tid bits
here and there. Its going to get nasty tomorrow.
commit a4cf1514d95dfd873675f08e6fd5d08aea810992
Author: Marko A. Rodriguez <[email protected]>
Date: 2016-02-11T19:39:30Z
Okay. Here is is. g.V().pageRank().order.by(pageRank).name.limit(10). We
know support chained OLAP jobs within a single Traversal. This is an amazing
piece of code. There is still lots more to do, but the basic framework is
working and is sound. Note that we were able to loosen up lots of
ComputerVerificaitonStrategy restrictions. Moreover, over the next couple of
pushes, we will loosen up even more. The OrderStep comparator stuff was a rats
nest that I have since cleaned up ... sometimes it wants a traverser, sometimes
an object...random. I can't believe we haven't seen more bugs cause of what we
had.
commit e2e38e691ee20ba697c2e737e678376a10f87da2
Author: Marko A. Rodriguez <[email protected]>
Date: 2016-02-11T19:46:28Z
Merge branch 'master' into TINKERPOP-1140
commit 17bfdb82ce89e24295940ad1926ae95e17eea821
Author: Marko A. Rodriguez <[email protected]>
Date: 2016-02-11T21:32:30Z
Was really dumb about SparkMessenger and GiraphMessenger. No need to insert
a start step. Weird. ComputerVerificationStrategy can now recognize when edge
properties are being manipulated in MapReduce. Added
TraveralHelper.getLastElementClass(traversal). Neato. Fixed up FileSystem
access in SparkGraphComputer. I think we have a bug in 3.1.2 that makes
chaining files bad. Weird we don't have a test for that. However, most people
don't chain vertex programs so I suspect no one will notice. 3.2.0 have it down
pat as you need solid chaining the new multi-OLAP compiler to work.
commit 66d86cdfbb00bf1eb74f067815181f90b40d6feb
Author: Marko A. Rodriguez <[email protected]>
Date: 2016-02-11T21:53:00Z
there is a bug in GiraphGraphComputer when chaining jobs together. damn. I
have things a bit cleaner. committing what I have for now.
commit cd1a37946848d28ad33d464fedac1655cadd5320
Author: Marko A. Rodriguez <[email protected]>
Date: 2016-02-12T18:57:05Z
Found a couple of bugs around job chaining in all GraphComputer
implementations. Wrote a solid GraphComputerTest to ensure job chaining behaves
as expected. Still a few more things to handle -- the Traversal.pageRank()
tests in Giraph don't pass and I'm not sure why.
commit 0a2243037792854c6ef9f834f41785e7a728ee7e
Author: Marko A. Rodriguez <[email protected]>
Date: 2016-02-12T21:02:46Z
Found a bug in TraverserMapReduce where the combiner should not just call
reduce(). Only Giraph was able to expose it because it actually calls
combine(). The test dataset is so small that Spark doesn't even kick it off.
TinkerGraph, because its in memory, doesn't use combine. Wow. That was a two
day bug. However, I have written so many more test cases. Things are really
pretty. And that ends my week with g.V.pageRank().order().by().limit(10)
working. Phew.
commit 491bef66112d75d42cde87ae000e63587e9b3bfe
Author: Marko A. Rodriguez <[email protected]>
Date: 2016-02-12T22:15:18Z
added another PageRankTest test case and made MapReduce.map() a non-default
implementation so users know they have to implement that method.
commit d627f9c06cd24b9c12d0938e3ebff20a39ae33b9
Author: Marko A. Rodriguez <[email protected]>
Date: 2016-02-12T22:31:12Z
minor nothing.
----
> TraversalVertexProgramStep in support of OLAP/OLTP conversions.
> ---------------------------------------------------------------
>
> Key: TINKERPOP-1140
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1140
> Project: TinkerPop
> Issue Type: Improvement
> Components: process
> Affects Versions: 3.1.0-incubating, 3.1.1-incubating, 3.1.2-incubating
> Reporter: Marko A. Rodriguez
> Fix For: 3.2.0-incubating
>
>
> This was moved from TINKERPOP-971. TINKERPOP-971 was about TraversalSource
> fluency. This is a feature we can now do because of that work. Isolated this
> feature in a new ticket.
> --------------
> Check this idea out:
> {code}
> g = graph.traversal().withComputer(graph ->
> graph.compute(SparkGraphComputer.class).workers(10));
> g.V().out().values('name')
> {code}
> This would compile to:
> {code}
> [TraversalVertexProgramStep([GraphStep,VerticesStep,PropertiesStep]),ComputerResultStep]
> {code}
> where,
> {code}
> TraversalVertexProgramStep<Graph,ComputerResult>
> ComputerResultStep<ComputerResult,E>
> {code}
> Next, check this:
> {code}
> g = graph.traversal().withComputer(graph ->
> graph.compute(SparkGraphComputer.class).workers(10));
> g.V().hasLabel('person').pageRank(out('knows')).order().by('page.rank',decr)
> {code}
> This will compile to:
> {code}
> [TraversalVertexProgramStep([GraphStep,HasStep]),PageRankVertexProgramStep([VerticesStep]),TraversalVertexProgramStep([OrderStep]),ComputerResultStep]
> {code}
> How does this work?!
> * TraversalVertexProgramStep will pass a {{ComputerResult}} to
> PageRankVertexProgram.
> ** The ComputerResult.graph() will have HALTED_TRAVERSERS properties on all
> the person vertices (as that is what was computed by the vertex program).
> * PageRankVertexProgram will then pass a {{ComputerResult}} to the next
> TraversalVertexProgram.
> ** That ComputerResult.graph() will have HALTED_TRAVERSER on all the person
> vertices and PAGE_RANK on all the vertices.
> * TraversalVertexProgram will then execute OrderStep which sorts the
> person-vertices with HALTED_TRAVERSERS on them by their page.rank properties
> computed previously.
> * ComputerResultStep will then take get the ComputerResult.memory.reducing
> and iterate it out.
> {{ComputerResultStep}} (like now) is smart about either pulling from the
> graph or from the sideEffect memory. What makes things different from now is
> that {{TraversalVertexProgramStep}} will come into existence and pull out the
> respective logic from {{ComputerResultStep}}. In this way, It will be
> possible to chain {{{XXXVertexProgramStep}}-steps and thus, have a traversal
> that is multiple OLAP jobs in sequence and thus, this ticket will fall
> naturally from this https://issues.apache.org/jira/browse/TINKERPOP-570. The
> whole trick to all of this is that (as we currently do) we save the state of
> the computation in the graph (and memory) and thus, feeding program into the
> next just takes over the computation. PageRankVertexProgram doesn't use
> HALTED_TRAVERSERS in its computation, so it just ignores it. However,
> TraversalVertexProgram can later access PAGE_RANK and thus, use the results
> of the previous OLAP computation.
> Finally, you want "lambda vertex programs?"
> {code}
> g = graph.traversal().withComputer(graph ->
> graph.compute(SparkGraphComputer.class).workers(10));
> myProgram = MyVertexProgram.build().create();
> g.V().program(myProgram).values('myProgramCounters')
> {code}
> which compiles to:
> {code}
> [TraversalVertexProgramStep([GraphStep]),LambdaVertexProgramStep(MyVertexProgram),TraversalVertexProgramStep([PropertiesStep]),ComputerResultStep]
> {code}
> Thus, just like {{filter()}},{{flatMap()}}, etc....
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)