Re: Accent insensitive search for greek characters

Shawn Heisey Mon, 16 Oct 2017 15:11:01 -0700

On 10/13/2017 1:28 AM, Chitra wrote:
>    I want to search greek characters(with accent insensitive) by removing
> or replacing accent marks with similar characters.
>
> Eg: when searching a greek accent word say *πῬοἲὅν*, we expect accent
> insensitive search ie need equivalent greek accent like *προιον*
>
> Moreover, I am not having more knowledge on Greek characters. so only I am
> looking for standard rules to perform greek accent insensitive search.
>
> Does *ICUFoldingFilter* solve my case? I have tried this already. Its
> working fine for greek accent characters. But this is not language
> specific... It has internalization support for all languages. Here, I am
> not sure whether it will break my existing language behavior in the index.
>
> Is there any way to make ICUFoldingFilter as language specific?


The entire point of the ICU filters is that they are functional across
all of Unicode -- all languages.  As far as I am aware, there is no way
to adjust what ICUFoldingFilter does.  According to the code, it
offloads all work to IBM's ICU library and does not offer any
configurability.

The following filters also exist, with less functionality than the ICU
filter:

https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-ASCIIFoldingFilter
https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-LowerCaseFilter

Those filters operate on single characters from the input, which means
they cannot take character context into account like ICU does.  If I am
reading what the ASCII filter does correctly, it may not work for Greek
characters at all -- it says that it folds to the lower range of ASCII,
and that character set doesn't have Greek letters.

Thanks,
Shawn

Re: Accent insensitive search for greek characters

Reply via email to