Joel,

  Thanks for the pointer. I went through your blog on Document routing, very
informative. I do need some clarifications on the implementation. I'll try
to run it based on my use case. 

I'm indexing documents from multiple source system out of which a bunch
consist of duplicate content. I'm trying to remove them by applying result
grouping / CollapsingQParserPlugin. For e.g. lets say I've source ABC, MNO
and XYZ. Now, ABC and MNO source contains the duplicate documents, which is
identified by a field say adskdedup. I've couple of shards, the id being the
url of the documents. Now, to make field collapsing work, I need to update
the id field to include "adskdedup!url" . Documents having identical
adskdedup values should route to a dedicated shard , e.g. shard1. The ones
which are not identical will be routed to either Shard1 or Shard2. After the
indexing is done, shard1 should have all documents on which grouping needs
to be applied upon.

During query time, depending on the query, results can be returned from both
shards. For e.g. a query
q=solr&group=true&group.field=adskdedup&group.ngroups=true would ideally
return data from both shards and apply the grouping on shard1 based on
adskdedup field. This will also ensure that group.ngroups=true will return
the right count.

The other clarification I wanted was based on this statement : "When a
tenant is too large to fit on a single shard it can be spread across
multiple shards be specifying the number of bits to use from the shard key."
If we split shards, will Result Grouping / CollapsingQParserPlugin and
number of results still work ?

Last but not the least, when are you planning to release 4.6.1 ?

Again, appreciate your help on this.

- Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Result-Grouping-vs-CollapsingQParserPlugin-tp4111331p4111375.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to