I don't know wether this was discussed previously,
but if you tell the synonmyfilter to not break your synonyms (which might be the default). In this case, the parts of the synonyms get new word positions. So you could use a Keywordtokenizer to avoid that behaviour:

        <filter class="solr.SynonymFilterFactory"
            synonyms="Synonyms.txt"
            ignoreCase="true"
            expand="false"
            tokenizerFactory="solr.KeywordTokenizerFactory"
        />

with regards,
konrad.

Am 14.08.2012 18:51, schrieb Marc Sturlese:
Well an example would be:
synonyms.txt:
huge,big size

The I have the docs:
1- The huge fox attacks first
2- The big size fox attacks first

Then if I query for huge, the highlights for each document are:

1- The <strong>huge</strong> <strong>fox</strong> attacks first
2- The <strong>big size</strong> fox attacks first

The analyzer looks like this:
fieldType name="sy_text" class="solr.TextField" positionIncrementGap="100">
       <analyzer type="index">
         <tokenizer class="solr.StandardTokenizerFactory"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.ASCIIFoldingFilterFactory"/>
         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="false" expand="true" />
       </analyzer>
       <analyzer type="query">
         <tokenizer class="solr.StandardTokenizerFactory"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.ASCIIFoldingFilterFactory"/>
         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="false" expand="true" />
       </analyzer>
     </fieldType>

This was working with a previous version of Solr (couldn't make it work with
3.6, 4-alpha nor 4-beta).



--
View this message in context: 
http://lucene.472066.n3.nabble.com/offsets-issues-with-multiword-synonyms-since-LUCENE-33-tp4001195p4001213.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to