On Wed, Sep 26, 2012 at 6:21 PM, Chris Hostetter
<hossman_luc...@fucit.org> wrote:
> 2) the coordinator node sums up the counts for any constraint returned by
> multiple nodes, and then picks the top (facet.limit) constraints based n
> the counts it knows about.

It's actually more sophisticated than that - we don't limit to the top
facet.limit constraints at the first phase.
For *all* constraints we see from the first phase, we calculate if it
could possibly be in the top facet.limit constraints (based on shards
we haven't heard from).  If so, we request exact counts from those
shards we haven't heard from.

> (but i believe this is second query
> is optimized to only ask a shard about a constraint if it didn't already
> get the count in the first request)

Correct.

> So imagine you have 3 shards, and querying them individually with
> facet.field=cat&facet.limit=3 you get...
>
> shardA: cars(8), books(7), computers(6)
> shardB: toys(8), books(7), garden(5)
> shardC: garden(4), books(3), computers(3)
>
> If you made a solr cloud query (or an explicit distributed query of those
> three shards), the first request the coordinator would send to each shard
> would specify a higher facet.limit, and might get back something like...
>
> shardA: cars(8), books(7), computers(6), cleaning(4), ...
> shardB: toys(8), books(7), garden(5), cleaning(4), ...
> shardC: garden(4), books(3), computers(3), plants(3), ...
>
> ...in which case "cleaning" pops up as a contender for being in the top
> constraints.  The coordinator sums up the counts for the constraints it
> knows about, and might decide that these are the top 3...
>
>         books(17), computers(9), cleaning(8)

To extend your example, Solr notices that "plants" has a count of 3 on
one shard, and was missing from the other two shards.
The maximum possible count it *could* have is 11 (3+4+4), which could
possibly put it in the top 3, hence it will also ask shardA and shardB
about "plants".

-Yonik
http://lucidworks.com

Reply via email to