Hello,
This morning, I just merged into TinkerPop 3.2.0-SNAPSHOT a large body of work
that allows us to now compile a single traversal into an arbitrary number of
OLAP and OLTP jobs. Along with this, I have taken the two algorithmic
VertexPrograms we have (PeerPressureVertexProgram and PageRankVertexProgram)
and made them available through GraphTraversal. That is, for most users, no
longer will you have to do
graph.compute().program(PageRankVertexProgram.build()…).submit().get().
Instead, you can simply do:
g.V().pageRank()
It gets better… Let me take you though a Gremlin Console session and show you
what you can do now:
gremlin> graph = TinkerFactory.createModern()
==>tinkergraph[vertices:6 edges:6]
gremlin> g = graph.traversal().withComputer()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], tinkergraphcomputer]
First, "withComputer()" is new that allows us to know configure the
GraphComputer used for the TraversalSource to a finer degree. Thats another
discussion, but just know thats better now too!
gremlin> g.V().pageRank()
==>v[6]
==>v[1]
==>v[3]
==>v[2]
==>v[4]
==>v[5]
pageRank() (and all other VertexProgramSteps) are sideEffect steps. The results
of the computation are on the vertices themselves.
gremlin> g.V().pageRank().valueMap()
==>[gremlin.pageRankVertexProgram.pageRank:[0.19250000000000003], name:[vadas],
gremlin.pageRankVertexProgram.edgeCount:[0.0], age:[27]]
==>[gremlin.pageRankVertexProgram.pageRank:[0.15000000000000002], name:[marko],
gremlin.pageRankVertexProgram.edgeCount:[3.0], age:[29]]
==>[gremlin.pageRankVertexProgram.pageRank:[0.4018125], name:[lop],
gremlin.pageRankVertexProgram.edgeCount:[0.0], lang:[java]]
==>[gremlin.pageRankVertexProgram.pageRank:[0.19250000000000003], name:[josh],
gremlin.pageRankVertexProgram.edgeCount:[2.0], age:[32]]
==>[gremlin.pageRankVertexProgram.pageRank:[0.15000000000000002], name:[peter],
gremlin.pageRankVertexProgram.edgeCount:[1.0], age:[35]]
==>[gremlin.pageRankVertexProgram.pageRank:[0.23181250000000003],
name:[ripple], gremlin.pageRankVertexProgram.edgeCount:[0.0], lang:[java]]
Another ticket for 3.2.0 will be to have the concept of "transient compute
keys" so the edgeCount scratch data will be dropped automagically. But still,
perhaps you want to use your own custom property.
gremlin> g.V().pageRank().by('rank').valueMap()
==>[name:[vadas], rank:[0.19250000000000003],
gremlin.pageRankVertexProgram.edgeCount:[0.0], age:[27]]
==>[name:[marko], rank:[0.15000000000000002],
gremlin.pageRankVertexProgram.edgeCount:[3.0], age:[29]]
==>[name:[lop], rank:[0.4018125],
gremlin.pageRankVertexProgram.edgeCount:[0.0], lang:[java]]
==>[name:[josh], rank:[0.19250000000000003],
gremlin.pageRankVertexProgram.edgeCount:[2.0], age:[32]]
==>[name:[peter], rank:[0.15000000000000002],
gremlin.pageRankVertexProgram.edgeCount:[1.0], age:[35]]
==>[name:[ripple], rank:[0.23181250000000003],
gremlin.pageRankVertexProgram.edgeCount:[0.0], lang:[java]]
How about your own custom edge traversal for propagating the energy.
gremlin> g.V().pageRank().
by(outE('knows')).
by('friendRank').
valueMap('name','friendRank')
==>[friendRank:[0.15000000000000002], name:[marko]]
==>[friendRank:[0.21375000000000002], name:[vadas]]
==>[friendRank:[0.15000000000000002], name:[lop]]
==>[friendRank:[0.21375000000000002], name:[josh]]
==>[friendRank:[0.15000000000000002], name:[peter]]
==>[friendRank:[0.15000000000000002], name:[ripple]]
What about a more complex job?
gremlin> g.V().pageRank().
by(outE('knows')).
by('friendRank').
order().by('friendRank',decr).
valueMap('name','friendRank')
==>[friendRank:[0.21375000000000002], name:[vadas]]
==>[friendRank:[0.21375000000000002], name:[josh]]
==>[friendRank:[0.15000000000000002], name:[ripple]]
==>[friendRank:[0.15000000000000002], name:[marko]]
==>[friendRank:[0.15000000000000002], name:[lop]]
==>[friendRank:[0.15000000000000002], name:[peter]]
What about clustering?
gremlin> g.V().peerPressure().by('cluster').
valueMap('name','cluster')
==>[cluster:[1], name:[vadas]]
==>[cluster:[1], name:[marko]]
==>[cluster:[1], name:[lop]]
==>[cluster:[1], name:[ripple]]
==>[cluster:[6], name:[peter]]
==>[cluster:[1], name:[josh]]
What about clustering and ranking?
gremlin> g.V().peerPressure().by('cluster').
pageRank().by('rank').
group().by('cluster').by(values('rank').sum())
==>[1:1.1686250000000002, 6:0.15000000000000002]
What about OLAP/OLTP?
gremlin> g.V().peerPressure().by('cluster').
pageRank().by('rank').
group().by('cluster').by().
order(local).by(values,{a,b -> a.size() <=> b.size()}).
limit(local,1).select(values).unfold().unfold().
valueMap('name','rank','cluster')
==>[cluster:[6], name:[peter], rank:[0.15000000000000002]]
That compilation goes OLAP->OLAP->OLAP->OLTP.
gremlin>
g.V().peerPressure().by('cluster').pageRank().by('rank').group().by('cluster').by().order(local).by(values,{a,b
-> a.size() <=>
b.size()}).limit(local,1).select(values).unfold().unfold().valueMap('name','rank','cluster').explain()
==>Traversal Explanation
====================================================================================================================================================================================================================================================================================================================================================================================================================================================
Original Traversal [GraphStep([],vertex),
PeerPressureVertexProgramStep([VertexStep(OUT,edge)],cluster,30),
PageRankVertexProgramStep([VertexStep(OUT,edge)],rank,30),
GroupStep(value(cluster),[TraversalMapStep(identity), FoldStep]),
OrderLocalStep(lambda), RangeLocalStep(0,1), LambdaMapStep(values), UnfoldStep,
UnfoldStep, PropertyMapStep([name, rank, cluster],value)]
ConnectiveStrategy [D] [GraphStep([],vertex),
PeerPressureVertexProgramStep([VertexStep(OUT,edge)],cluster,30),
PageRankVertexProgramStep([VertexStep(OUT,edge)],rank,30),
GroupStep(value(cluster),[TraversalMapStep(identity), FoldStep]),
OrderLocalStep(lambda), RangeLocalStep(0,1), LambdaMapStep(values), UnfoldStep,
UnfoldStep, PropertyMapStep([name, rank, cluster],value)]
VertexProgramStrategy [D]
[PeerPressureVertexProgramStep([VertexStep(OUT,edge)],cluster,30),
PageRankVertexProgramStep([VertexStep(OUT,edge)],rank,30),
TraversalVertexProgramStep([GraphStep([],vertex),
GroupStep(value(cluster),[TraversalMapStep(identity), FoldStep])]),
ComputerResultStep, OrderLocalStep(lambda), RangeLocalStep(0,1),
LambdaMapStep(values), UnfoldStep, UnfoldStep, PropertyMapStep([name, rank,
cluster],value)]
IdentityRemovalStrategy [O]
[PeerPressureVertexProgramStep([VertexStep(OUT,edge)],cluster,30),
PageRankVertexProgramStep([VertexStep(OUT,edge)],rank,30),
TraversalVertexProgramStep([GraphStep([],vertex),
GroupStep(value(cluster),[TraversalMapStep(identity), FoldStep])]),
ComputerResultStep, OrderLocalStep(lambda), RangeLocalStep(0,1),
LambdaMapStep(values), UnfoldStep, UnfoldStep, PropertyMapStep([name, rank,
cluster],value)]
IncidentToAdjacentStrategy [O]
[PeerPressureVertexProgramStep([VertexStep(OUT,edge)],cluster,30),
PageRankVertexProgramStep([VertexStep(OUT,edge)],rank,30),
TraversalVertexProgramStep([GraphStep([],vertex),
GroupStep(value(cluster),[TraversalMapStep(identity), FoldStep])]),
ComputerResultStep, OrderLocalStep(lambda), RangeLocalStep(0,1),
LambdaMapStep(values), UnfoldStep, UnfoldStep, PropertyMapStep([name, rank,
cluster],value)]
AdjacentToIncidentStrategy [O]
[PeerPressureVertexProgramStep([VertexStep(OUT,edge)],cluster,30),
PageRankVertexProgramStep([VertexStep(OUT,edge)],rank,30),
TraversalVertexProgramStep([GraphStep([],vertex),
GroupStep(value(cluster),[TraversalMapStep(identity), FoldStep])]),
ComputerResultStep, OrderLocalStep(lambda), RangeLocalStep(0,1),
LambdaMapStep(values), UnfoldStep, UnfoldStep, PropertyMapStep([name, rank,
cluster],value)]
FilterRankingStrategy [O]
[PeerPressureVertexProgramStep([VertexStep(OUT,edge)],cluster,30),
PageRankVertexProgramStep([VertexStep(OUT,edge)],rank,30),
TraversalVertexProgramStep([GraphStep([],vertex),
GroupStep(value(cluster),[TraversalMapStep(identity), FoldStep])]),
ComputerResultStep, OrderLocalStep(lambda), RangeLocalStep(0,1),
LambdaMapStep(values), UnfoldStep, UnfoldStep, PropertyMapStep([name, rank,
cluster],value)]
MatchPredicateStrategy [O]
[PeerPressureVertexProgramStep([VertexStep(OUT,edge)],cluster,30),
PageRankVertexProgramStep([VertexStep(OUT,edge)],rank,30),
TraversalVertexProgramStep([GraphStep([],vertex),
GroupStep(value(cluster),[TraversalMapStep(identity), FoldStep])]),
ComputerResultStep, OrderLocalStep(lambda), RangeLocalStep(0,1),
LambdaMapStep(values), UnfoldStep, UnfoldStep, PropertyMapStep([name, rank,
cluster],value)]
RangeByIsCountStrategy [O]
[PeerPressureVertexProgramStep([VertexStep(OUT,edge)],cluster,30),
PageRankVertexProgramStep([VertexStep(OUT,edge)],rank,30),
TraversalVertexProgramStep([GraphStep([],vertex),
GroupStep(value(cluster),[TraversalMapStep(identity), FoldStep])]),
ComputerResultStep, OrderLocalStep(lambda), RangeLocalStep(0,1),
LambdaMapStep(values), UnfoldStep, UnfoldStep, PropertyMapStep([name, rank,
cluster],value)]
TinkerGraphStepStrategy [P]
[PeerPressureVertexProgramStep([VertexStep(OUT,edge)],cluster,30),
PageRankVertexProgramStep([VertexStep(OUT,edge)],rank,30),
TraversalVertexProgramStep([GraphStep([],vertex),
GroupStep(value(cluster),[TraversalMapStep(identity), FoldStep])]),
ComputerResultStep, OrderLocalStep(lambda), RangeLocalStep(0,1),
LambdaMapStep(values), UnfoldStep, UnfoldStep, PropertyMapStep([name, rank,
cluster],value)]
ProfileStrategy [F]
[PeerPressureVertexProgramStep([VertexStep(OUT,edge)],cluster,30),
PageRankVertexProgramStep([VertexStep(OUT,edge)],rank,30),
TraversalVertexProgramStep([GraphStep([],vertex),
GroupStep(value(cluster),[TraversalMapStep(identity), FoldStep])]),
ComputerResultStep, OrderLocalStep(lambda), RangeLocalStep(0,1),
LambdaMapStep(values), UnfoldStep, UnfoldStep, PropertyMapStep([name, rank,
cluster],value)]
StandardVerificationStrategy [V]
[PeerPressureVertexProgramStep([VertexStep(OUT,edge)],cluster,30),
PageRankVertexProgramStep([VertexStep(OUT,edge)],rank,30),
TraversalVertexProgramStep([GraphStep([],vertex),
GroupStep(value(cluster),[TraversalMapStep(identity), FoldStep])]),
ComputerResultStep, OrderLocalStep(lambda), RangeLocalStep(0,1),
LambdaMapStep(values), UnfoldStep, UnfoldStep, PropertyMapStep([name, rank,
cluster],value)]
ComputerVerificationStrategy [V]
[PeerPressureVertexProgramStep([VertexStep(OUT,edge)],cluster,30),
PageRankVertexProgramStep([VertexStep(OUT,edge)],rank,30),
TraversalVertexProgramStep([GraphStep([],vertex),
GroupStep(value(cluster),[TraversalMapStep(identity), FoldStep])]),
ComputerResultStep, OrderLocalStep(lambda), RangeLocalStep(0,1),
LambdaMapStep(values), UnfoldStep, UnfoldStep, PropertyMapStep([name, rank,
cluster],value)]
Final Traversal
[PeerPressureVertexProgramStep([VertexStep(OUT,edge)],cluster,30),
PageRankVertexProgramStep([VertexStep(OUT,edge)],rank,30),
TraversalVertexProgramStep([GraphStep([],vertex),
GroupStep(value(cluster),[TraversalMapStep(identity), FoldStep])]),
ComputerResultStep, OrderLocalStep(lambda), RangeLocalStep(0,1),
LambdaMapStep(values), UnfoldStep, UnfoldStep, PropertyMapStep([name, rank,
cluster],value)]
There are a few more goodies to show like biasing the initial pageRank energy
for doing spreading activation-based algorithms, but we'll save that for later….
Finally, because this all compiles to the Gremlin traversal machine, this works
for any TinkerPop-enabled graph system.
Enjoy,
Marko.
http://markorodriguez.com