On 3/17/2017 9:07 AM, Erick Erickson wrote:
> I think the answer is that you have to co-locate the docs with the
> same value you're grouping by on the same shard whether in SolrCloud
> or not...
>
> Hmmm: from: 
> https://cwiki.apache.org/confluence/display/solr/Result+Grouping#ResultGrouping-DistributedResultGroupingCaveats
>
> "group.ngroups and group.facet require that all documents in each
> group must be co-located on the same shard in order for accurate
> counts to be returned."

That is not how things work right now.  The index has 170 million
documents in it, split into six large cold shards and a very small hot
shard.  The routing I'm using for the cold shards is the CRC32 hash of
the database primary key (different field than Solr's uniqueKey) run
through a mod function to determine shard number (0-5).  The hash/mod
calculation is done in the MySQL query.

Is pagination of a grouped query impossible with this index?

I suppose it's theoretically possible that I could hash the set name
instead of the DB primary key which would result in docs from a set
being co-located.  Would that help?  My worry with that approach is that
the cold shards would no longer have relatively uniform sizes.

Thanks,
Shawn

Reply via email to