[
https://issues.apache.org/jira/browse/TINKERPOP-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15218783#comment-15218783
]
ASF GitHub Bot commented on TINKERPOP-1225:
-------------------------------------------
GitHub user okram opened a pull request:
https://github.com/apache/incubator-tinkerpop/pull/285
TINKERPOP-1225: Do a "rolling reduce" for GroupXXXStep in OLAP.
https://issues.apache.org/jira/browse/TINKERPOP-1225
We now can do mid-barrier reductions in with `group()` on OLAP. This is
huge as this means that if you have an reducer in your `by()`-valueTraversal,
the stream is constantly reduced to limit memory consumption. Its more
expensive in terms of time (not by much) for small data, but for large data, no
worries about OME with group() (both both OLTP and OLAP).
CHANGELOG
```
* `GroupStep` and `GroupSideEffectStep` make use of mid-traversal reducers
to limit memory consumption in OLAP.
```
VOTE +1
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/incubator-tinkerpop TINKERPOP-1225
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-tinkerpop/pull/285.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #285
----
commit 8eb53815acfcec05316dcf39bd91d9e3a43d971a
Author: Marko A. Rodriguez <[email protected]>
Date: 2016-03-30T20:24:38Z
GroupStep and GroupSideEffectStep now make use mid-traversal barriers to do
data reduction on the fly in order to limite the memory footprint and reduce
the chances of OME. OLTP always did this, but now OLAP (which needs it more)
does it. This is epic. Also fixed a minor bug in ReducingBarrierStep. Added
some more GroupTest test cases -- one that does a groupCount() instead of a
group() just to make sure things are working as expected.
----
> Do a "rolling reduce" for GroupXXXStep in OLAP.
> -----------------------------------------------
>
> Key: TINKERPOP-1225
> URL: https://issues.apache.org/jira/browse/TINKERPOP-1225
> Project: TinkerPop
> Issue Type: Improvement
> Components: process
> Affects Versions: 3.1.1-incubating
> Reporter: Marko A. Rodriguez
> Assignee: Marko A. Rodriguez
>
> {{GroupXXXStep}} in OLAP is able to process traversers up to the first
> barrier in the reduction. 99% of the time, the first barrier is the last
> barrier and thus, you get a nice lazy computation which limits the memory
> footprint.
> Unfortunately, we don't have this luxury in OLAP. Until!!! However, the work
> that [~spmallette] did to get {{GroupBiOperator}} to serialize with
> traversals might make it possible for us to merge barriers in the reduction
> and thus have OLAP and OLTP {{GroupXXXStep}} behave analogously.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)