: The problem with the shards appears in the following scenario (note that
: the problem below also applies in a solr standalone enviroment with
: distributed search):
: 
: Shard1: DATA_SOURCE1 (3 docs), DATA_SOURCE2 (2 docs), DATA_SOURCE3 (2 docs).
: Shard2: DATA_SOURCE3 (2 docs), DATA_SOURCE2 (1 docs).
: 
: If you make a distributed search across these two shards, faceting
: dataSourceName with a limit of 1, it will ask for the top 1 in the first
: shard (DATA_SOURCE1 (3 docs)) and for the top 1 in the second shard
: (DATA_SOURCE3
: (2 docs)). After that it will merge the results and return DATA_SOURCE1 (3
: docs), when it should have return DATA_SOURCE3 (4 docs).

That's completley false.

a) in the first pass, even if you ask for "top 1" (ie: facet.limit=1) solr 
will overrequest when comunicating with each shard (the amount of 
overrequest is a function of your facet.limit, so as facet.limit increases 
so does the overrequest amount)

b) if *any* (but not *all*) shards returns DATA_SOURCE3 from the 
initial shard request, a second "refinement" step will request the count 
for DATA_SOURCE3 from all of the other shards to get an accurate count, 
and to accurately sort DATA_SOURCE3 to the top of the facet constraint 
list.


-Hoss
http://www.lucidworks.com/

Reply via email to