[GitHub] incubator-tinkerpop pull request: TINKERPOP-1082 & TINKERPOP-1222:...

okram Sat, 19 Mar 2016 05:03:21 -0700

GitHub user okram opened a pull request:

    https://github.com/apache/incubator-tinkerpop/pull/268


    TINKERPOP-1082 & TINKERPOP-1222: Hadoop Configuration Updates

    https://issues.apache.org/jira/browse/TINKERPOP-1082
    https://issues.apache.org/jira/browse/TINKERPOP-1222
    
    We had a very confusing situation with `gremlin.hadoop.graphInputFormat` 
and `gremlin.spark.graphInputRDD`. Not only did it cause a mess of `[WARN]` 
messages it was awkward as users had to know that one overrode the other. To 
make this cleaner, I created a new configuration called 
`gremlin.hadoop.graphReader` and `gremlin.hadoop.graphWriter` that can either 
take an `XXXFormat` or an `XXXRDD`. Internally, Spark/Giraph/etc. know how to 
reason on what is what.
    
    Finally, added `gremlin.hadoop.defaultGraphComputer` where users can 
specify a default `GraphComputer` in their proprties file and if so, 
`graph.compute()` will no longer throw an exception saying to use 
`graph.compute(class)`.
    
    Both of these changes are backwards compatible where there backwards 
compatibility is tested via `SparkHadoopGraphProvider` where via a coin-flip, 
sometimes the old model is used and sometimes the new model is used.
    
    Finally, I forgot to add docs on `GraphFilter` and they have been added to 
this PR.
    
    CHANGELOG
    
    ```
    * Added `gremlin.hadoop.defaultGraphComputer` so users can use 
`graph.compute()` with `HadoopGraph`.
    * Added `gremlin.hadoop.graphReader` and `gremlin.hadoop.graphWriter` which 
can handled `XXXFormats` and `XXXRDDs`.
    * Deprecated `gremlin.hadoop.graphInputFormat`, 
`gremlin.hadoop.graphOutputFormat`, `gremlin.spark.graphInputRDD`, and 
`gremlin.spark.graphOuputRDD`.
    ```
    
    UPDATE
    
    ```
    Hadoop Configurations
    ++++++++++++++++++
    
    Note that `gremlin.hadoop.graphInputFormat`, 
`gremlin.hadoop.graphOutputFormat`, `gremlin.spark.graphInputRDD`, and 
`gremlin.spark.graphOuputRDD` have all been deprecated. Using them still works, 
but moving forward, users only need to leverage `gremlin.hadoop.graphReader` 
and `gremlin.hadoop.graphWriter`. An example properties file snippet is 
provided below.
    
    gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
    
gremlin.hadoop.graphReader=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat
    
gremlin.hadoop.graphWriter=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
    gremlin.hadoop.jarsInDistributedCache=true
    
gremlin.hadoop.defaultGraphComputer=org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-1082

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-tinkerpop/pull/268.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #268
    
----
commit 6411d0d4142770f93fb1a188d7e991ed1b4355f3
Author: Marko A. Rodriguez <[email protected]>
Date:   2016-03-16T22:01:37Z

    gremlin.hadoop.graphReader and gremlin.hadoop.graphWriter are the new 
configurations replacing gremlin.hadoop.graphInputFormat and 
spark.graphInputRDD. Now HadoopGraph can handle either RDD or XXXFormats. 
Cleaner configurations. Backwards compatible. The older keys just map to the 
new keys inside HadoopConfiguration.

commit b7f617b383700390128fca53de48f60cda3211fe
Author: Marko A. Rodriguez <[email protected]>
Date:   2016-03-16T22:26:22Z

    fixed up the conf/.properties to use graphReader/graphWriter. Found more 
areas where inputFormat/outputFormat was still being used. Tested Giraph and 
its passing completely now. Need a helper utility that converts any 
Reader/Writer into an InputFormat or OutputFormat automagically.

commit 13561b81aa8287c696b8d79befce42f84792f793
Author: Marko A. Rodriguez <[email protected]>
Date:   2016-03-16T22:49:47Z

    ConfUtil does the dirty work of InputRDD or InputFormat conversion to an 
InputFormat.

commit 5f53589b487ab918719315db6047233fb13971ae
Author: Marko A. Rodriguez <[email protected]>
Date:   2016-03-17T14:42:57Z

    added gremlin.hadoop.defaultGraphComputer which allows users to specify in 
their properties file which GraphComputer to use by default. This allows 
providers that only support one Hadoop-based OLAP engine to 'hard set' the 
implementation so the syntax is cleaner -- graph.compute() vs. 
graph.compute(GiraphGraphComputer.class). This is backwards compatible. The 
SparkHadoopGraphProvider has been updated to sometimes use compute() and 
sometimes use compute(class).

commit 4a130d9092bc37dac252536280d60158fe75f74c
Author: Marko A. Rodriguez <[email protected]>
Date:   2016-03-17T15:09:16Z

    updated docs on GraphFilter and graphReader/graphWriter.

commit 5a9f56d53741c985982d2bb13d3d8f31ffb6dd85
Author: Marko A. Rodriguez <[email protected]>
Date:   2016-03-17T15:32:04Z

    gremlin.hadoop.graphInputFormat.hasEdges is not 
gremlin.hadoop.graphReader.hasEdges. Likewise for graphOuputFormat. Backwards 
compatible.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-tinkerpop pull request: TINKERPOP-1082 & TINKERPOP-1222:...

Reply via email to