On Fri, Oct 20, 2017 at 2:22 PM, kenny <ke...@ontoforce.com> wrote: > Thanks for the clear explanation. A couple of follow up questions > > - can we tune overrequesting in json API? >
Yes, I still need to document it, but you can specify a specific number of documents to overrequest: { type : field, field : cat, overrequest : 500 } Also note that the JSON facet API does not do refinement by default (it's not always desired). Add refine:true to the field facet if you do want it. > - we do see conflicting counts but that's when we have offsets different > from 0. We have now already tested it in solr 6.6 with json api. We > sometimes get the same value in different offsets: for example the range of > constraints [0,500] and [500,1000] might contain the same constraint. > That can happen with both regular faceting and with the JSON Facet API (deeper paging "discoveres" a new constraint which ranks higher). Regular faceting does more overrequest by default, and does refinement by default. So adding refine:true and a deeper overrequest for json facets should perform equivalently. -Yonik Kenny > > On 20-10-17 17:12, Yonik Seeley wrote: > > Facet refinement in Solr guarantees that counts for returned > constraints are correct, but does not guarantee that the top N > returned isn't missing a constraint. > > Consider the following shard counts (3 shards) for the following > constraints (aka facet values): > constraintA: 2 0 0 > constraintB: 0 2 0 > constraintC: 0 0 2 > constraintD: 1 1 1 > > Now for simplicity consider facet.limit=1: > Phase 1: retrieve the top 1 facet counts from all 3 shards (this gets > back A=2,B=2,C=2) > Phase 2: refinement: retrieve counts for A,B,C for any shard that did > not contribute to the count in Phase 1: (for example we ask shard2 and > shard3 for the count of A) > The counts are all correct, but we missed "D" because it never > appeared in Phase #1 > > Solr actually has overrequesting in the first phase to reduce the > chances of this happening (i.e. it won't actually happen with the > exact scenario above), but it can still happen. > > You can increase the overrequest amount > (seehttps://lucene.apache.org/solr/guide/6_6/faceting.html) > Or use streaming expressions or the SQL that goes on top of that in > the latest Solr releases. > > -Yonik > > > On Fri, Oct 20, 2017 at 10:19 AM, kenny <ke...@ontoforce.com> > <ke...@ontoforce.com> wrote: > > Hi all, > > When we run some 'deep' facet counts (eg facet values from 0 to 500 and then > from 500 to 1000), we see small but disturbing difference in counts between > the two (for example last count on first batch 165, first count on second > batch 167) > We run this on solr 5.3.1 in cloud mode (3 shards) in non-json facet module > Any-one seen ths before? I could not find any bug reported like this. > > Thanks > > Kenny > > > > -- > > [image: ONTOFORCE] <http://www.ontoforce.com/> > Kenny Knecht, PhD > CTO and technical lead > +32 486 75 66 16 <00324756616> > ke...@ontoforce.com > www.ontoforce.com > Meetdistrict, Ottergemsesteenweg-Zuid 808, 9000 Gent, Belgium > <https://maps.google.com/?q=Ottergemsesteenweg-Zuid+808,+9000+Gent,+Belgium&entry=gmail&source=g> > CIC, One Broadway, MA 02142 Cambridge, United States >