Re: Unicode case folding

Robert Muir Mon, 21 Feb 2011 09:25:42 -0800

On Mon, Feb 21, 2011 at 12:16 PM, Avi Rosenschein
<arosensch...@gmail.com> wrote:
> Is there any analyzer that can do full Unicode case folding (for example, as
> described at
> http://www.w3.org/International/wiki/Case_folding#Recommendations_for_Case_Folding
> )?


Hi, in branch_3x you can use the ICUNormalizer2FilterFactory to do
this (normalization mode NFKC_CF)

http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/solr/contrib/analysis-extras/src/java/org/apache/solr/analysis/ICUNormalizer2FilterFactory.java

You can simply use this instead of LowerCaseFilter (just setup your
solr/lib with the solr-analysis-extras.jar, icu jar, and lucene's
contrib-icu jar).

> If there isn't an analyzer for this - any suggestions on how to roll my own?
> Should I simply apply String.toUpperCase() followed by .toLowerCase()?

No, I would recommend using the actual full case folding (with
normalization) instead. This is not the same as uppercase + lowercase.
For example, it will correctly handle the 3 forms of greek sigma.

Re: Unicode case folding

Reply via email to