On Mon, Feb 21, 2011 at 12:16 PM, Avi Rosenschein <arosensch...@gmail.com> wrote: > Is there any analyzer that can do full Unicode case folding (for example, as > described at > http://www.w3.org/International/wiki/Case_folding#Recommendations_for_Case_Folding > )?
Hi, in branch_3x you can use the ICUNormalizer2FilterFactory to do this (normalization mode NFKC_CF) http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/solr/contrib/analysis-extras/src/java/org/apache/solr/analysis/ICUNormalizer2FilterFactory.java You can simply use this instead of LowerCaseFilter (just setup your solr/lib with the solr-analysis-extras.jar, icu jar, and lucene's contrib-icu jar). > If there isn't an analyzer for this - any suggestions on how to roll my own? > Should I simply apply String.toUpperCase() followed by .toLowerCase()? No, I would recommend using the actual full case folding (with normalization) instead. This is not the same as uppercase + lowercase. For example, it will correctly handle the 3 forms of greek sigma.