Re: Unicode case folding

Avi Rosenschein Mon, 21 Feb 2011 10:21:19 -0800

Excellent. Thanks, Robert!

-- Avi


On Mon, Feb 21, 2011 at 19:24, Robert Muir <rcm...@gmail.com> wrote:

> On Mon, Feb 21, 2011 at 12:16 PM, Avi Rosenschein
> <arosensch...@gmail.com> wrote:
> > Is there any analyzer that can do full Unicode case folding (for example,
> as
> > described at
> >
> http://www.w3.org/International/wiki/Case_folding#Recommendations_for_Case_Folding
> > )?
>
> Hi, in branch_3x you can use the ICUNormalizer2FilterFactory to do
> this (normalization mode NFKC_CF)
>
>
> http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/solr/contrib/analysis-extras/src/java/org/apache/solr/analysis/ICUNormalizer2FilterFactory.java
>
> You can simply use this instead of LowerCaseFilter (just setup your
> solr/lib with the solr-analysis-extras.jar, icu jar, and lucene's
> contrib-icu jar).
>
> > If there isn't an analyzer for this - any suggestions on how to roll my
> own?
> > Should I simply apply String.toUpperCase() followed by .toLowerCase()?
>
> No, I would recommend using the actual full case folding (with
> normalization) instead. This is not the same as uppercase + lowercase.
> For example, it will correctly handle the 3 forms of greek sigma.
>

Re: Unicode case folding

Reply via email to