Hi,
I am using Solr v1.4 and I am not sure which facet.method I should use.
What should I use if I do not know in advance if the number of values
for a given field will be high or low?
What are the pros/cons of using facet.method=enum vs. facet.method=fc?
When should I use enum vs. fc?
I have found some comments and suggestions here:
"enum enumerates all terms in a field, calculating the set intersection
of documents that match the term with documents that match the query.
This was the default (and only) method for faceting multi-valued fields
prior to Solr 1.4.
"fc (stands for field cache), the facet counts are calculated by
iterating over documents that match the query and summing the terms
that appear in each document. This was the default method for single
valued fields prior to Solr 1.4.
The default value is fc (except for BoolField) since it tends to use
less memory and is faster when a field has many unique terms in the
index."
-- http://wiki.apache.org/solr/SimpleFacetParameters#facet.method
"facet.method=enum [...] this is excellent for fields where there is
a small set of distinct values. The average number of values per
document does not matter.
facet.method=fc [...] this is excellent for situations where the
number of indexed values for the field is high, but the number of
values per document is low. For multi-valued fields, a hybrid approach
is used that uses term filters from the filterCache for terms that
match many documents."
-- http://wiki.apache.org/solr/SolrFacetingOverview
"If you are faceting on a field that you know only has a small number
of values (say less than 50), then it is advisable to explicitly set
this to enum. When faceting on multiple fields, remember to set this
for the specific fields desired and not universally for all facets.
The request handler configuration is a good place to put this."
-- Book: "Solr 1.4 Enterprise Search Server", pag. 148
This is the part of the Solr code which deals with the facet.method
parameter:
if (enumMethod) {
counts = getFacetTermEnumCounts([...]);
} else {
if (multiToken) {
UnInvertedField uif = [...]
counts = uif.getCounts([...]);
} else {
[...]
if (per_segment) {
[...]
counts = ps.getFacetCounts([...]);
} else {
counts = getFieldCacheCounts([...]);
}
}
}
--
https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/request/SimpleFacets.java
See also:
-
http://stackoverflow.com/questions/2902680/how-well-does-solr-scale-over-large-number-of-facet-values
At the end, since I do not know in advance the number of different
values for my fields I went for facet.method=fc, does this seems
reasonable to you?
Thank you,
Paolo