Faceting is much happier if you use a single valued field, but my apps
all require multivalued fields:
<doc>
<arr name="subject">
<str>aaa</str>
<str>bbb</str>
<str>ccc</str>
</arr>
</doc>
I'd like to use copyField to accumulate the multivalued fields into a
single field that can be efficiently faceted. (As written, it adds a
new field for each one and throws an error if multiValued="false")
The simplest thing i can think of is to check if the copyField target
is multivalued, if not, accumulate the values separated by some token
that the copyField target will split.
perhaps something like:
<fieldtype name="facetable" class="solr.StrField" omitNorms="true">
<analyzer>
<tokenizer class="solr.RegexTokenizerFactory">
<str name="pattern">;</str> <!-- tokens=input.split( ";" ) -->
</tokenizer>
<filter class="solr.TrimFilterFactory" />
</analyzer>
</fieldtype>
<field name="subject" type="text" indexed="true" stored="true"
multiValued="true"/>
<field name="subject_facet" type="facetable" indexed="true"
stored="false" multiValued="false"/>
<copyField source="subject" dest="subject_facet" accumulate=";" />
If ';' is not in the input, this would work. Is there some character
guaranteed not to be in any input? Maybe i should call it
"facet_field" rather then "facetable" - i keep reading it as "face
table"
Any thoughts on this design would be great.
thanks
ryan