Re: offsets issues with multiword synonyms since LUCENE_33

Konrad Lötzsch Wed, 15 Aug 2012 00:19:52 -0700

I don't know wether this was discussed previously,

but if you tell the synonmyfilter to not break your synonyms (whichmight be the default). In this case, the parts of the synonyms get newword positions. So you could use a Keywordtokenizer to avoid that behaviour:


        <filter class="solr.SynonymFilterFactory"
            synonyms="Synonyms.txt"
            ignoreCase="true"
            expand="false"
            tokenizerFactory="solr.KeywordTokenizerFactory"
        />

with regards,
konrad.

Am 14.08.2012 18:51, schrieb Marc Sturlese:

Well an example would be:
synonyms.txt:
huge,big size

The I have the docs:
1- The huge fox attacks first
2- The big size fox attacks first

Then if I query for huge, the highlights for each document are:

1- The <strong>huge</strong> <strong>fox</strong> attacks first
2- The <strong>big size</strong> fox attacks first

The analyzer looks like this:
fieldType name="sy_text" class="solr.TextField" positionIncrementGap="100">
       <analyzer type="index">
         <tokenizer class="solr.StandardTokenizerFactory"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.ASCIIFoldingFilterFactory"/>
         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="false" expand="true" />
       </analyzer>
       <analyzer type="query">
         <tokenizer class="solr.StandardTokenizerFactory"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.ASCIIFoldingFilterFactory"/>
         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="false" expand="true" />
       </analyzer>
     </fieldType>

This was working with a previous version of Solr (couldn't make it work with
3.6, 4-alpha nor 4-beta).



--
View this message in context: 
http://lucene.472066.n3.nabble.com/offsets-issues-with-multiword-synonyms-since-LUCENE-33-tp4001195p4001213.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: offsets issues with multiword synonyms since LUCENE_33

Reply via email to