Thanks for this guys, really excellent explanation!

On Thu, Sep 27, 2012 at 12:15 AM, Yonik Seeley <yo...@lucidworks.com> wrote:
> On Wed, Sep 26, 2012 at 6:21 PM, Chris Hostetter
> <hossman_luc...@fucit.org> wrote:
>> 2) the coordinator node sums up the counts for any constraint returned by
>> multiple nodes, and then picks the top (facet.limit) constraints based n
>> the counts it knows about.
>
> It's actually more sophisticated than that - we don't limit to the top
> facet.limit constraints at the first phase.
> For *all* constraints we see from the first phase, we calculate if it
> could possibly be in the top facet.limit constraints (based on shards
> we haven't heard from).  If so, we request exact counts from those
> shards we haven't heard from.
>
>> (but i believe this is second query
>> is optimized to only ask a shard about a constraint if it didn't already
>> get the count in the first request)
>
> Correct.
>
>> So imagine you have 3 shards, and querying them individually with
>> facet.field=cat&facet.limit=3 you get...
>>
>> shardA: cars(8), books(7), computers(6)
>> shardB: toys(8), books(7), garden(5)
>> shardC: garden(4), books(3), computers(3)
>>
>> If you made a solr cloud query (or an explicit distributed query of those
>> three shards), the first request the coordinator would send to each shard
>> would specify a higher facet.limit, and might get back something like...
>>
>> shardA: cars(8), books(7), computers(6), cleaning(4), ...
>> shardB: toys(8), books(7), garden(5), cleaning(4), ...
>> shardC: garden(4), books(3), computers(3), plants(3), ...
>>
>> ...in which case "cleaning" pops up as a contender for being in the top
>> constraints.  The coordinator sums up the counts for the constraints it
>> knows about, and might decide that these are the top 3...
>>
>>         books(17), computers(9), cleaning(8)
>
> To extend your example, Solr notices that "plants" has a count of 3 on
> one shard, and was missing from the other two shards.
> The maximum possible count it *could* have is 11 (3+4+4), which could
> possibly put it in the top 3, hence it will also ask shardA and shardB
> about "plants".
>
> -Yonik
> http://lucidworks.com

Reply via email to