: The problem with the shards appears in the following scenario (note that : the problem below also applies in a solr standalone enviroment with : distributed search): : : Shard1: DATA_SOURCE1 (3 docs), DATA_SOURCE2 (2 docs), DATA_SOURCE3 (2 docs). : Shard2: DATA_SOURCE3 (2 docs), DATA_SOURCE2 (1 docs). : : If you make a distributed search across these two shards, faceting : dataSourceName with a limit of 1, it will ask for the top 1 in the first : shard (DATA_SOURCE1 (3 docs)) and for the top 1 in the second shard : (DATA_SOURCE3 : (2 docs)). After that it will merge the results and return DATA_SOURCE1 (3 : docs), when it should have return DATA_SOURCE3 (4 docs).
That's completley false. a) in the first pass, even if you ask for "top 1" (ie: facet.limit=1) solr will overrequest when comunicating with each shard (the amount of overrequest is a function of your facet.limit, so as facet.limit increases so does the overrequest amount) b) if *any* (but not *all*) shards returns DATA_SOURCE3 from the initial shard request, a second "refinement" step will request the count for DATA_SOURCE3 from all of the other shards to get an accurate count, and to accurately sort DATA_SOURCE3 to the top of the facet constraint list. -Hoss http://www.lucidworks.com/