Sorry for introducing bad information. Because it happens in the json facet api, I thought it would also happen in the facet. Soyrry again for the misunderstood.
2016-07-07 16:08 GMT-03:00 Chris Hostetter <hossman_luc...@fucit.org>: > > : The problem with the shards appears in the following scenario (note that > : the problem below also applies in a solr standalone enviroment with > : distributed search): > : > : Shard1: DATA_SOURCE1 (3 docs), DATA_SOURCE2 (2 docs), DATA_SOURCE3 (2 > docs). > : Shard2: DATA_SOURCE3 (2 docs), DATA_SOURCE2 (1 docs). > : > : If you make a distributed search across these two shards, faceting > : dataSourceName with a limit of 1, it will ask for the top 1 in the first > : shard (DATA_SOURCE1 (3 docs)) and for the top 1 in the second shard > : (DATA_SOURCE3 > : (2 docs)). After that it will merge the results and return DATA_SOURCE1 > (3 > : docs), when it should have return DATA_SOURCE3 (4 docs). > > That's completley false. > > a) in the first pass, even if you ask for "top 1" (ie: facet.limit=1) solr > will overrequest when comunicating with each shard (the amount of > overrequest is a function of your facet.limit, so as facet.limit increases > so does the overrequest amount) > > b) if *any* (but not *all*) shards returns DATA_SOURCE3 from the > initial shard request, a second "refinement" step will request the count > for DATA_SOURCE3 from all of the other shards to get an accurate count, > and to accurately sort DATA_SOURCE3 to the top of the facet constraint > list. > > > -Hoss > http://www.lucidworks.com/ >