Joel, Thanks for the pointer. I went through your blog on Document routing, very informative. I do need some clarifications on the implementation. I'll try to run it based on my use case.
I'm indexing documents from multiple source system out of which a bunch consist of duplicate content. I'm trying to remove them by applying result grouping / CollapsingQParserPlugin. For e.g. lets say I've source ABC, MNO and XYZ. Now, ABC and MNO source contains the duplicate documents, which is identified by a field say adskdedup. I've couple of shards, the id being the url of the documents. Now, to make field collapsing work, I need to update the id field to include "adskdedup!url" . Documents having identical adskdedup values should route to a dedicated shard , e.g. shard1. The ones which are not identical will be routed to either Shard1 or Shard2. After the indexing is done, shard1 should have all documents on which grouping needs to be applied upon. During query time, depending on the query, results can be returned from both shards. For e.g. a query q=solr&group=true&group.field=adskdedup&group.ngroups=true would ideally return data from both shards and apply the grouping on shard1 based on adskdedup field. This will also ensure that group.ngroups=true will return the right count. The other clarification I wanted was based on this statement : "When a tenant is too large to fit on a single shard it can be spread across multiple shards be specifying the number of bits to use from the shard key." If we split shards, will Result Grouping / CollapsingQParserPlugin and number of results still work ? Last but not the least, when are you planning to release 4.6.1 ? Again, appreciate your help on this. - Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Result-Grouping-vs-CollapsingQParserPlugin-tp4111331p4111375.html Sent from the Solr - User mailing list archive at Nabble.com.