So does mincount get considered in this as well?

On Tue, Oct 2, 2012 at 10:19 AM, Jamie Johnson <jej2...@gmail.com> wrote:
> Thanks for this guys, really excellent explanation!
>
> On Thu, Sep 27, 2012 at 12:15 AM, Yonik Seeley <yo...@lucidworks.com> wrote:
>> On Wed, Sep 26, 2012 at 6:21 PM, Chris Hostetter
>> <hossman_luc...@fucit.org> wrote:
>>> 2) the coordinator node sums up the counts for any constraint returned by
>>> multiple nodes, and then picks the top (facet.limit) constraints based n
>>> the counts it knows about.
>>
>> It's actually more sophisticated than that - we don't limit to the top
>> facet.limit constraints at the first phase.
>> For *all* constraints we see from the first phase, we calculate if it
>> could possibly be in the top facet.limit constraints (based on shards
>> we haven't heard from).  If so, we request exact counts from those
>> shards we haven't heard from.
>>
>>> (but i believe this is second query
>>> is optimized to only ask a shard about a constraint if it didn't already
>>> get the count in the first request)
>>
>> Correct.
>>
>>> So imagine you have 3 shards, and querying them individually with
>>> facet.field=cat&facet.limit=3 you get...
>>>
>>> shardA: cars(8), books(7), computers(6)
>>> shardB: toys(8), books(7), garden(5)
>>> shardC: garden(4), books(3), computers(3)
>>>
>>> If you made a solr cloud query (or an explicit distributed query of those
>>> three shards), the first request the coordinator would send to each shard
>>> would specify a higher facet.limit, and might get back something like...
>>>
>>> shardA: cars(8), books(7), computers(6), cleaning(4), ...
>>> shardB: toys(8), books(7), garden(5), cleaning(4), ...
>>> shardC: garden(4), books(3), computers(3), plants(3), ...
>>>
>>> ...in which case "cleaning" pops up as a contender for being in the top
>>> constraints.  The coordinator sums up the counts for the constraints it
>>> knows about, and might decide that these are the top 3...
>>>
>>>         books(17), computers(9), cleaning(8)
>>
>> To extend your example, Solr notices that "plants" has a count of 3 on
>> one shard, and was missing from the other two shards.
>> The maximum possible count it *could* have is 11 (3+4+4), which could
>> possibly put it in the top 3, hence it will also ask shardA and shardB
>> about "plants".
>>
>> -Yonik
>> http://lucidworks.com

Reply via email to