Thanks for the answer,
on analysis page i see that solr ignore tags so simbols like <>='# and that treat like words (i use StandardTokenizerFactory) so it do not matter if i only have to search in the field: <w ana='#n' xml:lang='grc-Grek'>βίβλος</w>
i can use a query like this: "w ana n βιβλος"~3
but if i want this word in the tag <w> inside another tag <foreign>
so i wonder to do queries like this:
<w ana='#n' *>word</w>
<w ana='#adj' *>word</w>
<foreign ana='cdswInter'>*<w ana='#n' *>word</w>*</foreign>: in this case is important to find the final </foreign> match I do not find nothing useful in solr documentation for this particular tag search.

Best,
Valentina

Il 06/07/2016 17:27, Erick Erickson ha scritto:
What do you see if you use the admin/analysis page? That should give
you a clue what's happening here....

Best,
Erick

On Wed, Jul 6, 2016 at 7:04 AM, Valentina Cavazza <valent...@step-net.it> wrote:
We created a new field type, this field type is used for a sentence that
contains text in latin and old greek language
the text can include greek words with accents
we want to be able to do an accent insensitive search so for example:
if i search the word βιβλος i want to find in the text the word βίβλος with
iota coronis accent.
Similarly if I search the word βίβλος with iota acute accent i again want to
find in the text the word βίβλος with iota coronis accent.
I looked for solutions and i found the filter ASCIIFoldingFilterFactory
i installed that filter but do not make the correct job for greek language
<fieldType name="text_acs" class="solr.TextField"
positionIncrementGap="1000">
       <analyzer type="index">
     <tokenizer class="solr.StandardTokenizerFactory" />
         <filter class="solr.ASCIIFoldingFilterFactory" />
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.GreekStemFilterFactory"/>
         </analyzer>
         <analyzer type="query">
             <tokenizer class="solr.StandardTokenizerFactory"/>
                 <filter class="solr.ASCIIFoldingFilterFactory" />
                 <filter class="solr.LowerCaseFilterFactory"/>
                 <filter class="solr.GreekStemFilterFactory"/>
         </analyzer>
    </fieldType>
If we use ICUFoldingFilterFactory filter, single word search works well but
if we use a regex query or search for a phrase query, that we used before
the filter ICUFoldingFilterFactory installation, do not work.
<fieldType name="text_acs" class="solr.TextField"
positionIncrementGap="1000">
       <analyzer type="index">
     <tokenizer class="solr.StandardTokenizerFactory" />
         <filter class="solr.ICUFoldingFilterFactory" />
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.GreekStemFilterFactory"/>
         </analyzer>
         <analyzer type="query">
             <tokenizer class="solr.StandardTokenizerFactory"/>
                 <filter class="solr.ICUFoldingFilterFactory" />
                 <filter class="solr.LowerCaseFilterFactory"/>
                 <filter class="solr.GreekStemFilterFactory"/>
         </analyzer>
    </fieldType>
We have in the text field the word like this: <w ana='#n'
xml:lang='grc-Grek'>βίβλος</w>
if i search the word βιβλος i want I find in the text the word βίβλος with
iota coronis accent.OK
If I search the word βίβλος with iota acute accent i again find in the text
the word βίβλος with iota coronis accent.OK
I also need that the user can be able to search the word and the tag
container w: <w ana='#n'></w>




--

Valentina Cavazza
*STEP srl*
Tel. 011.98.66.277 / 0121.37.47.27
Fax. 011.98.66.728
E-mail. valent...@step-net.it
Web. www.step-net.it <http://www.step-net.it>

Reply via email to