Hi,
I like the best of both worlds:
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-specials.txt"
/>
Mask some specials like "C++" to "cplusplus" or "C#" to "csharp" ...
<tokenizer class="solr.ICUTokenizerFactory" />
Tokenize an identify on unicode whitespaces and charsets
<filter class="solr.WordDelimiterFilterFactory" />
Well known splitter for composed words
<filter class="solr.ICUFoldingFilterFactory" />
Perfect superset of <charFilter ... ISOLatin1Accent.txt"/>
or the ISOLatin1AccentFilterFactory because it can handle composed and
decomposed accents and umlauts
<filter class="solr.CJKBigramFilterFactory" />
Nice workaround for missing whitespace as word separator in this
languages.
Am 01.01.2013 17:48, schrieb Jack Krupansky:
Hmmm... quite some time ago I switched from ASCIIFoldingFilterFactory
to MappingCharFilterFactory, because I was told (by who I can't recall)
that the latter was "better/preferred". Is there any particular reason
to favor one over the other?
-----Original Message----- From: Erick Erickson
ASCIIFoldingFilterFactory is preferred, does that suit your needs?