Excellent. Thanks, Robert! -- Avi
On Mon, Feb 21, 2011 at 19:24, Robert Muir <rcm...@gmail.com> wrote: > On Mon, Feb 21, 2011 at 12:16 PM, Avi Rosenschein > <arosensch...@gmail.com> wrote: > > Is there any analyzer that can do full Unicode case folding (for example, > as > > described at > > > http://www.w3.org/International/wiki/Case_folding#Recommendations_for_Case_Folding > > )? > > Hi, in branch_3x you can use the ICUNormalizer2FilterFactory to do > this (normalization mode NFKC_CF) > > > http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/solr/contrib/analysis-extras/src/java/org/apache/solr/analysis/ICUNormalizer2FilterFactory.java > > You can simply use this instead of LowerCaseFilter (just setup your > solr/lib with the solr-analysis-extras.jar, icu jar, and lucene's > contrib-icu jar). > > > If there isn't an analyzer for this - any suggestions on how to roll my > own? > > Should I simply apply String.toUpperCase() followed by .toLowerCase()? > > No, I would recommend using the actual full case folding (with > normalization) instead. This is not the same as uppercase + lowercase. > For example, it will correctly handle the 3 forms of greek sigma. >