Thanks for this guys, really excellent explanation!
On Thu, Sep 27, 2012 at 12:15 AM, Yonik Seeley <yo...@lucidworks.com> wrote: > On Wed, Sep 26, 2012 at 6:21 PM, Chris Hostetter > <hossman_luc...@fucit.org> wrote: >> 2) the coordinator node sums up the counts for any constraint returned by >> multiple nodes, and then picks the top (facet.limit) constraints based n >> the counts it knows about. > > It's actually more sophisticated than that - we don't limit to the top > facet.limit constraints at the first phase. > For *all* constraints we see from the first phase, we calculate if it > could possibly be in the top facet.limit constraints (based on shards > we haven't heard from). If so, we request exact counts from those > shards we haven't heard from. > >> (but i believe this is second query >> is optimized to only ask a shard about a constraint if it didn't already >> get the count in the first request) > > Correct. > >> So imagine you have 3 shards, and querying them individually with >> facet.field=cat&facet.limit=3 you get... >> >> shardA: cars(8), books(7), computers(6) >> shardB: toys(8), books(7), garden(5) >> shardC: garden(4), books(3), computers(3) >> >> If you made a solr cloud query (or an explicit distributed query of those >> three shards), the first request the coordinator would send to each shard >> would specify a higher facet.limit, and might get back something like... >> >> shardA: cars(8), books(7), computers(6), cleaning(4), ... >> shardB: toys(8), books(7), garden(5), cleaning(4), ... >> shardC: garden(4), books(3), computers(3), plants(3), ... >> >> ...in which case "cleaning" pops up as a contender for being in the top >> constraints. The coordinator sums up the counts for the constraints it >> knows about, and might decide that these are the top 3... >> >> books(17), computers(9), cleaning(8) > > To extend your example, Solr notices that "plants" has a count of 3 on > one shard, and was missing from the other two shards. > The maximum possible count it *could* have is 11 (3+4+4), which could > possibly put it in the top 3, hence it will also ask shardA and shardB > about "plants". > > -Yonik > http://lucidworks.com