Perhaps my assumptions about merge are wrong. When I run a search with the collapsing filter (q=*:*&fq={!collapse field=VENDOR_NAME}...) I get "dupes" if the same VENDOR_NAME is on shard1 and shard2. Here's the response:
"response": { "numFound": 24158, "start": 0, "docs": [ { "VENDOR_NAME": "01DB METRAVIB SAS", "[shard]": "http://localhost:8983/solr/ShardTest1_shard1_0_replica1/|http://localhost:8984/solr/ShardTest1_shard1_0_replica2/" }, { "VENDOR_NAME": "01DB METRAVIB SAS", "[shard]": "http://localhost:8983/solr/ShardTest1_shard1_1_replica1/|http://localhost:8984/solr/ShardTest1_shard1_1_replica2/" }, { "VENDOR_NAME": "1 BIG SELF STORE LTD", "[shard]": "http://localhost:8983/solr/ShardTest1_shard1_0_replica1/|http://localhost:8984/solr/ShardTest1_shard1_0_replica2/" } ] } You can see the same vendor is returned from shard1_1 and shard1_0. So I'm expecting the same results from my plugin (once I get it to work). I thought the merge strategy could be used to filter out the "duplicate" vendor. So would that require rebuilding the document list and then replacing the solr response like shardResponse.setSolrResponse()? And if that is the correct approach, I could return many more results than the user expected. If I'm thinking correctly, then worse case is no "dupes" between the shards and the returned result count is rows X shards. To make sure the correct results are returned based on the sort I'll also have to resort the merged results. So for a search like q=*:*&fl=vendor&sort=vendor asc... results example: shard 1 docs: { A, B, D } shard 2 docs: { B, C, D } So walking through the solr responses for each shard I end up with a return set of { A, B, C, D } Joel Bernstein wrote > Can you describe more about what you're trying to do in the merge? Why > does > it seem it's too late to drop documents in the merge? > > If you can provide a very simple example with some sample records and a > sample output, that would be helpful. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Thu, Aug 4, 2016 at 4:25 PM, tedsolr < > tsmith@ > > wrote: > >> I've been struggling just to get my search plugin working for sharded >> collections, but I haven't ascertained if my end goal is even achievable. >> I >> have a plugin that groups documents that are considered duplicates (based >> on >> multiple fields - like the CollapsingQParserPlugin). When responses come >> back from different shards another culling will be necessary to remove >> dupes >> between the shards. In the merge() method it seems it will be too late to >> simply "drop" documents. Is this something that the client will just have >> to >> deal with? Maybe in the process() method of a search component? I was >> expecting to be able to preserve the requested return count, but that >> seems >> really unlikely now. >> >> Thanks for any suggestions, >> Ted v5.2.1 >> >> >> >> -- >> View this message in context: http://lucene.472066.n3.nabble.com/Can-a- >> MergeStrategy-filter-returned-docs-tp4290446.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> -- View this message in context: http://lucene.472066.n3.nabble.com/Can-a-MergeStrategy-filter-returned-docs-tp4290446p4290458.html Sent from the Solr - User mailing list archive at Nabble.com.