Re: Filter cache pollution during sharded edismax queries

Chris Hostetter Wed, 01 Oct 2014 10:19:25 -0700

: +1 for using a different cache, but that's being quite unfamiliar with the
: code.

in (a) common case, people tend to "drill down" and filter on facet
constraints -- so using a special purpose cache for the refinements would
result in redundent caching of the same info in multiple places.

: > > What's the point to refine these counts? I've thought that it make sense
: > > only for facet.limit ed requests. Is it correct statement? can those who

refinement only happens if facet.limit is used and there are eligable
"top" constraints that were not returned by some shards.

: > > suffer from the low performance, just unlimit facet.limit to avoid that
: > > distributed hop?

As noted, setting facet.limit=-1 might help for low cardinality fields to
ensure that every shard returns a count for every value and no-refinement
is needed, but that doesn't really help you for fields with
unknown/unbounded cardinality.

As part of the distributed pivot faceting work, the amount of
"overrequest" done in phase 1 (for both facet.pivot & facet.field) was
made configurable via 2 new parameters...

https://lucene.apache.org/solr/4_10_0/solr-solrj/org/apache/solr/common/params/FacetParams.html#FACET_OVERREQUEST_RATIO
https://lucene.apache.org/solr/4_10_0/solr-solrj/org/apache/solr/common/params/FacetParams.html#FACET_OVERREQUEST_COUNT

...so depending on the distribution of your data, you might find that by
adjusting those values to increase the amount of overrequesting done, you
can decrease the amount of refinement needed -- but there are obviously
tradeoffs.

-Hoss
http://www.lucidworks.com/

Re: Filter cache pollution during sharded edismax queries

Reply via email to