On Fri, Oct 20, 2017 at 2:22 PM, kenny <ke...@ontoforce.com> wrote:

> Thanks for the clear explanation. A couple of follow up questions
>
> - can we tune overrequesting in json API?
>

Yes, I still need to document it, but you can specify a specific number of
documents to overrequest:
{
  type : field,
  field : cat,
  overrequest : 500
}

Also note that the JSON facet API does not do refinement by default (it's
not always desired).
Add refine:true to the field facet if you do want it.


> - we do see conflicting counts but that's when we have offsets different
> from 0. We have now already tested it in solr 6.6 with json api. We
> sometimes get the same value in different offsets: for example the range of
> constraints [0,500] and [500,1000] might contain the same constraint.
>

That can happen with both regular faceting and with the JSON Facet API
(deeper paging "discoveres" a new constraint which ranks higher).
Regular faceting does more overrequest by default, and does refinement by
default.  So adding refine:true and a deeper overrequest for json facets
should perform equivalently.

 -Yonik

Kenny
>
> On 20-10-17 17:12, Yonik Seeley wrote:
>
> Facet refinement in Solr guarantees that counts for returned
> constraints are correct, but does not guarantee that the top N
> returned isn't missing a constraint.
>
> Consider the following shard counts (3 shards) for the following
> constraints (aka facet values):
> constraintA: 2 0 0
> constraintB: 0 2 0
> constraintC: 0 0 2
> constraintD: 1 1 1
>
> Now for simplicity consider facet.limit=1:
> Phase 1: retrieve the top 1 facet counts from all 3 shards (this gets
> back A=2,B=2,C=2)
> Phase 2: refinement: retrieve counts for A,B,C for any shard that did
> not contribute to the count in Phase 1: (for example we ask shard2 and
> shard3 for the count of A)
> The counts are all correct, but we missed "D" because it never
> appeared in Phase #1
>
> Solr actually has overrequesting in the first phase to reduce the
> chances of this happening (i.e. it won't actually happen with the
> exact scenario above), but it can still happen.
>
> You can increase the overrequest amount 
> (seehttps://lucene.apache.org/solr/guide/6_6/faceting.html)
> Or use streaming expressions or the SQL that goes on top of that in
> the latest Solr releases.
>
> -Yonik
>
>
> On Fri, Oct 20, 2017 at 10:19 AM, kenny <ke...@ontoforce.com> 
> <ke...@ontoforce.com> wrote:
>
> Hi all,
>
> When we run some 'deep' facet counts (eg facet values from 0 to 500 and then
> from 500 to 1000), we see small but disturbing difference in counts between
> the two (for example last count on first batch 165, first count on second
> batch 167)
> We run this on solr 5.3.1 in cloud mode (3 shards) in non-json facet module
> Any-one seen ths before? I could not find any bug reported like this.
>
> Thanks
>
> Kenny
>
>
>
> --
>
> [image: ONTOFORCE] <http://www.ontoforce.com/>
> Kenny Knecht, PhD
> CTO and technical lead
> +32 486 75 66 16 <00324756616>
> ke...@ontoforce.com
> www.ontoforce.com
> Meetdistrict, Ottergemsesteenweg-Zuid 808, 9000 Gent, Belgium
> <https://maps.google.com/?q=Ottergemsesteenweg-Zuid+808,+9000+Gent,+Belgium&entry=gmail&source=g>
> CIC, One Broadway, MA 02142 Cambridge, United States
>

Reply via email to