On 3/17/2017 9:07 AM, Erick Erickson wrote: > I think the answer is that you have to co-locate the docs with the > same value you're grouping by on the same shard whether in SolrCloud > or not... > > Hmmm: from: > https://cwiki.apache.org/confluence/display/solr/Result+Grouping#ResultGrouping-DistributedResultGroupingCaveats > > "group.ngroups and group.facet require that all documents in each > group must be co-located on the same shard in order for accurate > counts to be returned."
That is not how things work right now. The index has 170 million documents in it, split into six large cold shards and a very small hot shard. The routing I'm using for the cold shards is the CRC32 hash of the database primary key (different field than Solr's uniqueKey) run through a mod function to determine shard number (0-5). The hash/mod calculation is done in the MySQL query. Is pagination of a grouped query impossible with this index? I suppose it's theoretically possible that I could hash the set name instead of the DB primary key which would result in docs from a set being co-located. Would that help? My worry with that approach is that the cold shards would no longer have relatively uniform sizes. Thanks, Shawn