So does mincount get considered in this as well?
On Tue, Oct 2, 2012 at 10:19 AM, Jamie Johnson <jej2...@gmail.com> wrote: > Thanks for this guys, really excellent explanation! > > On Thu, Sep 27, 2012 at 12:15 AM, Yonik Seeley <yo...@lucidworks.com> wrote: >> On Wed, Sep 26, 2012 at 6:21 PM, Chris Hostetter >> <hossman_luc...@fucit.org> wrote: >>> 2) the coordinator node sums up the counts for any constraint returned by >>> multiple nodes, and then picks the top (facet.limit) constraints based n >>> the counts it knows about. >> >> It's actually more sophisticated than that - we don't limit to the top >> facet.limit constraints at the first phase. >> For *all* constraints we see from the first phase, we calculate if it >> could possibly be in the top facet.limit constraints (based on shards >> we haven't heard from). If so, we request exact counts from those >> shards we haven't heard from. >> >>> (but i believe this is second query >>> is optimized to only ask a shard about a constraint if it didn't already >>> get the count in the first request) >> >> Correct. >> >>> So imagine you have 3 shards, and querying them individually with >>> facet.field=cat&facet.limit=3 you get... >>> >>> shardA: cars(8), books(7), computers(6) >>> shardB: toys(8), books(7), garden(5) >>> shardC: garden(4), books(3), computers(3) >>> >>> If you made a solr cloud query (or an explicit distributed query of those >>> three shards), the first request the coordinator would send to each shard >>> would specify a higher facet.limit, and might get back something like... >>> >>> shardA: cars(8), books(7), computers(6), cleaning(4), ... >>> shardB: toys(8), books(7), garden(5), cleaning(4), ... >>> shardC: garden(4), books(3), computers(3), plants(3), ... >>> >>> ...in which case "cleaning" pops up as a contender for being in the top >>> constraints. The coordinator sums up the counts for the constraints it >>> knows about, and might decide that these are the top 3... >>> >>> books(17), computers(9), cleaning(8) >> >> To extend your example, Solr notices that "plants" has a count of 3 on >> one shard, and was missing from the other two shards. >> The maximum possible count it *could* have is 11 (3+4+4), which could >> possibly put it in the top 3, hence it will also ask shardA and shardB >> about "plants". >> >> -Yonik >> http://lucidworks.com