Thanks for the clear explanation. A couple of follow up questions

- can we tune overrequesting in json API?

- we do see conflicting counts but that's when we have offsets different from 0. We have now already tested it in solr 6.6 with json api. We sometimes get the same value in different offsets: for example the range of constraints [0,500] and [500,1000] might contain the same constraint.


Kenny


On 20-10-17 17:12, Yonik Seeley wrote:
Facet refinement in Solr guarantees that counts for returned
constraints are correct, but does not guarantee that the top N
returned isn't missing a constraint.

Consider the following shard counts (3 shards) for the following
constraints (aka facet values):
constraintA: 2 0 0
constraintB: 0 2 0
constraintC: 0 0 2
constraintD: 1 1 1

Now for simplicity consider facet.limit=1:
Phase 1: retrieve the top 1 facet counts from all 3 shards (this gets
back A=2,B=2,C=2)
Phase 2: refinement: retrieve counts for A,B,C for any shard that did
not contribute to the count in Phase 1: (for example we ask shard2 and
shard3 for the count of A)
The counts are all correct, but we missed "D" because it never
appeared in Phase #1

Solr actually has overrequesting in the first phase to reduce the
chances of this happening (i.e. it won't actually happen with the
exact scenario above), but it can still happen.

You can increase the overrequest amount (see
https://lucene.apache.org/solr/guide/6_6/faceting.html)
Or use streaming expressions or the SQL that goes on top of that in
the latest Solr releases.

-Yonik


On Fri, Oct 20, 2017 at 10:19 AM, kenny <ke...@ontoforce.com> wrote:
Hi all,

When we run some 'deep' facet counts (eg facet values from 0 to 500 and then
from 500 to 1000), we see small but disturbing difference in counts between
the two (for example last count on first batch 165, first count on second
batch 167)
We run this on solr 5.3.1 in cloud mode (3 shards) in non-json facet module
Any-one seen ths before? I could not find any bug reported like this.

Thanks

Kenny


--

ONTOFORCE <http://www.ontoforce.com/>     
Kenny Knecht, PhD
CTO and technical lead
+32 486 75 66 16 <tel:00324756616>
ke...@ontoforce.com <mailto:ke...@ontoforce.com>
www.ontoforce.com <http://www.ontoforce.com/>

Meetdistrict, Ottergemsesteenweg-Zuid 808, 9000 Gent, Belgium
CIC, One Broadway, MA 02142 Cambridge, United States

Reply via email to