Hi,

I like the best of both worlds:
 <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-specials.txt" 
/>
 Mask some specials like "C++" to "cplusplus" or "C#" to "csharp" ...
 <tokenizer class="solr.ICUTokenizerFactory" />
 Tokenize an identify on unicode whitespaces and charsets
 <filter class="solr.WordDelimiterFilterFactory" />
 Well known splitter for composed words
 <filter class="solr.ICUFoldingFilterFactory" />
 Perfect superset of <charFilter ... ISOLatin1Accent.txt"/>
or the ISOLatin1AccentFilterFactory because it can handle composed and decomposed accents and umlauts
 <filter class="solr.CJKBigramFilterFactory" />
Nice workaround for missing whitespace as word separator in this languages.


Am 01.01.2013 17:48, schrieb Jack Krupansky:
Hmmm... quite some time ago I switched from ASCIIFoldingFilterFactory
to MappingCharFilterFactory, because I was told (by who I can't recall)
that the latter was "better/preferred". Is there any particular reason
to favor one over the other?
-----Original Message----- From: Erick Erickson
ASCIIFoldingFilterFactory is preferred, does that suit your needs?

Reply via email to