I am getting an OOM error trying to combine streaming operations. I think the
sort is the issue. This test was done on a single replica cloud setup of
v6.1 with 4GB heap. col1 has 1M docs. col2 has 10k docs. The search for each
collection was q=*:*. Using SolrJ:

CloudSolrStream searchStream = new CloudSolrStream(zkHosts, "col1", map); //
base search from user
ReducerStream reducer = new ReducerStream(searchStream, comparator,
reducer); //  this groups docs by a field list
SortStream idSort = new SortStream(reducer, new FieldComparator("id",
ComparatorOrder.ASCENDING)); // this resorts the results so I can compare
them against data from another collection
CloudSolrStream ruleStream = new CloudSolrStream(zkHosts, "col2", map); 
TupleStream stream = new IntersectStream(idSort, ruleStream, new
FieldEqualitor("id")); // this provides a filter

When I open the stream it works for about 30 seconds then dies. So a single
user search on a very small collection (1M docs) can overwhelm a 4GB heap.
And this chain isn't even done! I still need to merge with yet a third
collection and then resort with the user's specified sort parameter.

Is there something fundamentally wrong with my approach?
thanks, Ted



Joel Bernstein wrote
> Hi,
> 
> The streaming API in Solr 6x has been expanded to supported many different
> parallel computing workloads. For example the topic stream supports
> pub/sub
> messaging. The gatherNodes stream supports graph traversal. The facet
> stream supports aggregations inside the search engine, while the rollup
> stream supports shuffling map / reduce aggregations. Stored queries and
> large scale alerting is on the way...
> 
> The sort stream is designed to be used at scale in parallel mode. It can
> currently sort about 1,000,000 docs per second on a single worker. So if
> you have 20 workers it can sort 20,000,000 docs per second. The plan is to
> eventually switch to the fork/join merge sort so that you get parallelism
> within the same worker.
> 
> 
> 
> Joel Bernstein
> http://joelsolr.blogspot.com/





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Specify-sorting-of-merged-streams-tp4285026p4288083.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to