On Wed, Sep 26, 2012 at 6:21 PM, Chris Hostetter <hossman_luc...@fucit.org> wrote: > 2) the coordinator node sums up the counts for any constraint returned by > multiple nodes, and then picks the top (facet.limit) constraints based n > the counts it knows about.
It's actually more sophisticated than that - we don't limit to the top facet.limit constraints at the first phase. For *all* constraints we see from the first phase, we calculate if it could possibly be in the top facet.limit constraints (based on shards we haven't heard from). If so, we request exact counts from those shards we haven't heard from. > (but i believe this is second query > is optimized to only ask a shard about a constraint if it didn't already > get the count in the first request) Correct. > So imagine you have 3 shards, and querying them individually with > facet.field=cat&facet.limit=3 you get... > > shardA: cars(8), books(7), computers(6) > shardB: toys(8), books(7), garden(5) > shardC: garden(4), books(3), computers(3) > > If you made a solr cloud query (or an explicit distributed query of those > three shards), the first request the coordinator would send to each shard > would specify a higher facet.limit, and might get back something like... > > shardA: cars(8), books(7), computers(6), cleaning(4), ... > shardB: toys(8), books(7), garden(5), cleaning(4), ... > shardC: garden(4), books(3), computers(3), plants(3), ... > > ...in which case "cleaning" pops up as a contender for being in the top > constraints. The coordinator sums up the counts for the constraints it > knows about, and might decide that these are the top 3... > > books(17), computers(9), cleaning(8) To extend your example, Solr notices that "plants" has a count of 3 on one shard, and was missing from the other two shards. The maximum possible count it *could* have is 11 (3+4+4), which could possibly put it in the top 3, hence it will also ask shardA and shardB about "plants". -Yonik http://lucidworks.com