The merge strategy probably won't work for the type of distributed collapse you're describing.
You may want to begin exploring the Streaming API which supports real-time map/reduce operations, http://joelsolr.blogspot.com/2015/03/parallel-computing-with-solrcloud.html Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Sep 2, 2015 at 5:12 PM, tedsolr <tsm...@sciquest.com> wrote: > I've read from http://heliosearch.org/solrs-mergestrategy/ > <http://heliosearch.org/solrs-mergestrategy/> that the AnalyticsQuery > component only works for a single instance of Solr. I'm planning to > "migrate" to the SolrCloud soon and I have a custom AnalyticsQuery module > that collapses what I consider to be duplicate documents, keeping stats > like > a "count" of the dupes. For my purposes "dupes" are determined at run time > and vary by the search request. Once a collection has multiple shards I > will > not be able to prevent "dupes" from appearing across those shards. A custom > merge strategy should allow me to merge my stats, but I don't see how I can > drop duplicate docs at that point. > > If shard1 returns docs A & B and shard2 returns docs B & C (letters > denoting > what I consider to be unique docs), can my implementation of a merge > strategy return only docs A, B, & C, rather than A, B, B, & C? > > thanks! > solr 5.2.1 > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Merging-documents-from-a-distributed-search-tp4226802.html > Sent from the Solr - User mailing list archive at Nabble.com. >