I don't know wether this was discussed previously,
but if you tell the synonmyfilter to not break your synonyms (which
might be the default). In this case, the parts of the synonyms get new
word positions. So you could use a Keywordtokenizer to avoid that behaviour:
<filter class="solr.SynonymFilterFactory"
synonyms="Synonyms.txt"
ignoreCase="true"
expand="false"
tokenizerFactory="solr.KeywordTokenizerFactory"
/>
with regards,
konrad.
Am 14.08.2012 18:51, schrieb Marc Sturlese:
Well an example would be:
synonyms.txt:
huge,big size
The I have the docs:
1- The huge fox attacks first
2- The big size fox attacks first
Then if I query for huge, the highlights for each document are:
1- The <strong>huge</strong> <strong>fox</strong> attacks first
2- The <strong>big size</strong> fox attacks first
The analyzer looks like this:
fieldType name="sy_text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="false" expand="true" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="false" expand="true" />
</analyzer>
</fieldType>
This was working with a previous version of Solr (couldn't make it work with
3.6, 4-alpha nor 4-beta).
--
View this message in context:
http://lucene.472066.n3.nabble.com/offsets-issues-with-multiword-synonyms-since-LUCENE-33-tp4001195p4001213.html
Sent from the Solr - User mailing list archive at Nabble.com.