Re: High facet.limit (with only 2-3 actual facets) -> Massive bandwidth consumption in DistributedSearch

Yonik Seeley Thu, 08 Sep 2011 13:34:37 -0700

On Thu, Sep 8, 2011 at 4:18 PM, Frederik Kraus <frederik.kr...@gmail.com> wrote:
>  Now that is quite interesting indeed and sounds like a bug to me. Including 
> facets with a count of 0 we have a few 100k which then apparently get 
> transferred. hmhmhm
>
> Can anyone with more knowledge of the facet component maybe chime in why the 
> miscount is removed?


It's a trade-off, for sure.  Here's what the code says:

          if (dff.sort.equals(FacetParams.FACET_SORT_COUNT)) {
            if (dff.limit > 0) {
              // set the initial limit higher to increase accuracy
              dff.initialLimit = (int)(dff.initialLimit * 1.5) + 10;
              dff.initialMincount = 0;      // TODO: we could change
this to 1, but would then need more refinement for small facet result
sets?
            } else {
              // if limit==-1, then no need to artificially lower
mincount to 0 if it's 1
              dff.initialMincount = Math.min(dff.minCount, 1);
            }

So this is bad if you have a high facet.limit, but really few actual matches.
It may be better for large base docsets that match a lot of facet
values (but in that case, one would expect to see few zeros anyway).
So perhaps using 0 as the mincount isn't the right tradeoff?

-Yonik
http://www.lucene-eurocon.com - The Lucene/Solr User Conference

Re: High facet.limit (with only 2-3 actual facets) -> Massive bandwidth consumption in DistributedSearch

Reply via email to