On 6/26/2014 7:27 AM, Allison, Timothy B. wrote: > So, I'm left with this as a candidate for the "text_all" field (I'll probably > add a stop filter, too): > > <fieldType name="text_all" class="solr.TextField" > positionIncrementGap="100"> > <analyzer> > <tokenizer class="solr.ICUTokenizerFactory"/> > <!-- normalize width before bigram, as e.g. half-width dakuten > combine --> > <filter class="solr.CJKWidthFilterFactory"/> > <!-- for any non-CJK --> > <filter class="solr.ICUFoldingFilterFactory"/> > <filter class="solr.CJKBigramFilterFactory" outputUnigrams="true"/> > </analyzer> > </fieldType> > > Any and all feedback welcome. Again, the goal is to create a field that is > as robust as possible against all languages as a fallback to the language > specific fields.
I believe that ICUFoldingFilter does everything that CJKWidthFilter does, so you can probably remove that filter. Width folding is mentioned in the javadocs: http://lucene.apache.org/core/4_8_0/analyzers-icu/org/apache/lucene/analysis/icu/ICUFoldingFilter.html If I'm wrong about that, someone please let me know. Thanks, Shawn