Hi,
I am using Solr v1.4 and I am not sure which facet.method I should use.

What should I use if I do not know in advance if the number of values
for a given field will be high or low?

What are the pros/cons of using facet.method=enum vs. facet.method=fc?

When should I use enum vs. fc?

I have found some comments and suggestions here:

 "enum enumerates all terms in a field, calculating the set intersection
  of documents that match the term with documents that match the query.
  This was the default (and only) method for faceting multi-valued fields
  prior to Solr 1.4.
 "fc (stands for field cache), the facet counts are calculated by
  iterating over documents that match the query and summing the terms
  that appear in each document. This was the default method for single
  valued fields prior to Solr 1.4.
  The default value is fc (except for BoolField) since it tends to use
  less memory and is faster when a field has many unique terms in the
  index."
  -- http://wiki.apache.org/solr/SimpleFacetParameters#facet.method

 "facet.method=enum [...] this is excellent for fields where there is
  a small set of distinct values. The average number of values per
  document does not matter.
  facet.method=fc [...] this is excellent for situations where the
  number of indexed values for the field is high, but the number of
  values per document is low. For multi-valued fields, a hybrid approach
  is used that uses term filters from the filterCache for terms that
  match many documents."
  -- http://wiki.apache.org/solr/SolrFacetingOverview

 "If you are faceting on a field that you know only has a small number
  of values (say less than 50), then it is advisable to explicitly set
  this to enum. When faceting on multiple fields, remember to set this
  for the specific fields desired and not universally for all facets.
  The request handler configuration is a good place to put this."
  -- Book: "Solr 1.4 Enterprise Search Server", pag. 148

This is the part of the Solr code which deals with the facet.method
parameter:

  if (enumMethod) {
    counts = getFacetTermEnumCounts([...]);
  } else {
    if (multiToken) {
      UnInvertedField uif = [...]
      counts = uif.getCounts([...]);
    } else {
      [...]
      if (per_segment) {
        [...]
        counts = ps.getFacetCounts([...]);
      } else {
        counts = getFieldCacheCounts([...]);
      }
    }
  }
-- https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/request/SimpleFacets.java

See also:

- http://stackoverflow.com/questions/2902680/how-well-does-solr-scale-over-large-number-of-facet-values

At the end, since I do not know in advance the number of different
values for my fields I went for facet.method=fc, does this seems
reasonable to you?

Thank you,
Paolo

Reply via email to