Re: Merging documents from a distributed search

2015-09-08 Thread tedsolr
Joel, It needs to perform. Typically users will have 1 - 5 million rows in a query, returning 10 - 15 fields. Grouping reduces the return by 50% or more normally. Responses tend be less than a half second. It sounds like the manipulation of docs at the collector level has been left to the single

Re: Merging documents from a distributed search

2015-09-04 Thread tedsolr
Upayavira , The docs are all unique. In my example the two docs are considered to be dupes because the requested fields all have the same values. fields AB C D E Doc 1: apple, 10, 15, bye, yellow Doc 2: apple, 12, 15, by, green The two docs are certainly unique. Say they are on

Re: Merging documents from a distributed search

2015-09-04 Thread Joel Bernstein
It's possible that the ReducerStream's buffer can grow too large if document groups are very large. But the ReducerStream only needs to hold one group at a time in memory. The RollupStream, in trunk, has a grouping implementation that doesn't hang on to all the Tuples from a group. You could also i

Re: Merging documents from a distributed search

2015-09-03 Thread Upayavira
On Wed, Sep 2, 2015, at 10:12 PM, tedsolr wrote: > I've read from http://heliosearch.org/solrs-mergestrategy/ > that the AnalyticsQuery > component only works for a single instance of Solr. I'm planning to > "migrate" to the SolrCloud soon and I ha

RE: Merging documents from a distributed search

2015-09-03 Thread Markus Jelsma
It seems so indeed. Please look up the thread titled "Custom merge logic in SolrCloud." -Original message- > From:tedsolr > Sent: Thursday 3rd September 2015 21:28 > To: solr-user@lucene.apache.org > Subject: RE: Merging documents from a distributed s

RE: Merging documents from a distributed search

2015-09-03 Thread tedsolr
Markus, did you mistakingly post a link to this same thread? -- View this message in context: http://lucene.472066.n3.nabble.com/Merging-documents-from-a-distributed-search-tp4226802p4227035.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Merging documents from a distributed search

2015-09-03 Thread tedsolr
Thanks Joel, that link looks promising. The CloudSolrStream bypasses my issue of multiple shards. Perhaps the ReducerStream would provide what I need. At first glance I worry that the the buffer would grow too large - if its really holding the values for all the fields in each document (Tuple.getMa

RE: Merging documents from a distributed search

2015-09-03 Thread Markus Jelsma
solr-user@lucene.apache.org > Subject: RE: Merging documents from a distributed search > > Hello - We're doing something similar ended up overriding QueryComponent > (https://issues.apache.org/jira/browse/SOLR-7968) which needs protected > members instead of private members first

RE: Merging documents from a distributed search

2015-09-03 Thread Markus Jelsma
> Sent: Wednesday 2nd September 2015 23:46 > To: solr-user@lucene.apache.org > Subject: Re: Merging documents from a distributed search > > The merge strategy probably won't work for the type of distributed collapse > you're describing. > > You may want to begin explo

Re: Merging documents from a distributed search

2015-09-02 Thread Joel Bernstein
The merge strategy probably won't work for the type of distributed collapse you're describing. You may want to begin exploring the Streaming API which supports real-time map/reduce operations, http://joelsolr.blogspot.com/2015/03/parallel-computing-with-solrcloud.html Joel Bernstein http://joels