I am getting an OOM error trying to combine streaming operations. I think the sort is the issue. This test was done on a single replica cloud setup of v6.1 with 4GB heap. col1 has 1M docs. col2 has 10k docs. The search for each collection was q=*:*. Using SolrJ:
CloudSolrStream searchStream = new CloudSolrStream(zkHosts, "col1", map); // base search from user ReducerStream reducer = new ReducerStream(searchStream, comparator, reducer); // this groups docs by a field list SortStream idSort = new SortStream(reducer, new FieldComparator("id", ComparatorOrder.ASCENDING)); // this resorts the results so I can compare them against data from another collection CloudSolrStream ruleStream = new CloudSolrStream(zkHosts, "col2", map); TupleStream stream = new IntersectStream(idSort, ruleStream, new FieldEqualitor("id")); // this provides a filter When I open the stream it works for about 30 seconds then dies. So a single user search on a very small collection (1M docs) can overwhelm a 4GB heap. And this chain isn't even done! I still need to merge with yet a third collection and then resort with the user's specified sort parameter. Is there something fundamentally wrong with my approach? thanks, Ted Joel Bernstein wrote > Hi, > > The streaming API in Solr 6x has been expanded to supported many different > parallel computing workloads. For example the topic stream supports > pub/sub > messaging. The gatherNodes stream supports graph traversal. The facet > stream supports aggregations inside the search engine, while the rollup > stream supports shuffling map / reduce aggregations. Stored queries and > large scale alerting is on the way... > > The sort stream is designed to be used at scale in parallel mode. It can > currently sort about 1,000,000 docs per second on a single worker. So if > you have 20 workers it can sort 20,000,000 docs per second. The plan is to > eventually switch to the fork/join merge sort so that you get parallelism > within the same worker. > > > > Joel Bernstein > http://joelsolr.blogspot.com/ -- View this message in context: http://lucene.472066.n3.nabble.com/Specify-sorting-of-merged-streams-tp4285026p4288083.html Sent from the Solr - User mailing list archive at Nabble.com.