We created a new field type, this field type is used for a sentence that
contains text in latin and old greek language
the text can include greek words with accents
we want to be able to do an accent insensitive search so for example:
if i search the word βιβλος i want to find in the text the word βίβλος
with iota coronis accent.
Similarly if I search the word βίβλος with iota acute accent i again
want to find in the text the word βίβλος with iota coronis accent.
I looked for solutions and i found the filter ASCIIFoldingFilterFactory
i installed that filter but do not make the correct job for greek language
<fieldType name="text_acs" class="solr.TextField"
positionIncrementGap="1000">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.ASCIIFoldingFilterFactory" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.GreekStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.GreekStemFilterFactory"/>
</analyzer>
</fieldType>
If we use ICUFoldingFilterFactory filter, single word search works well
but if we use a regex query or search for a phrase query, that we used
before the filter ICUFoldingFilterFactory installation, do not work.
<fieldType name="text_acs" class="solr.TextField"
positionIncrementGap="1000">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.ICUFoldingFilterFactory" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.GreekStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ICUFoldingFilterFactory" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.GreekStemFilterFactory"/>
</analyzer>
</fieldType>
We have in the text field the word like this: <w ana='#n'
xml:lang='grc-Grek'>βίβλος</w>
if i search the word βιβλος i want I find in the text the word βίβλος
with iota coronis accent.OK
If I search the word βίβλος with iota acute accent i again find in the
text the word βίβλος with iota coronis accent.OK
I also need that the user can be able to search the word and the tag
container w: <w ana='#n'></w>