Multi word synonyms

Zac Smith Sat, 04 Feb 2012 14:41:23 -0800

Hi

I have seen several questions on this already but haven't been able to sort my 
issue. My problem is that multi-word synonyms aren't behaving as I would 
expect. I have copied my field type definition at the bottom of this message, 
but the relevant synonym filter is here (used at index time):
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
ignoreCase="true" expand="true" tokenizerFactory="solr.KeywordTokenizerFactory" 
/>


Say I have synonyms.txt setup like this:
syrup,sugar syrup,stock syrup

When indexing the text 'syrup', the 3 phrases are treated equivalently as 
expected. I can see this in the Index Analyzer as they all occupy the same term 
position.

But if all of the synonyms are a phrase, it doesn't work. 
e.g. synonyms.txt looks like:
simple syrup,sugar syrup,stock syrup

Now when putting the text 'simple syrup' into the Index Analyzer I can only see 
the original term listed. It is not finding the synonyms.

Anyone know how to fix this?

Zac

Field Type definition:
<fieldType name="phrase_searcher" class="solr.TextField" 
positionIncrementGap="100" autoGeneratePhraseQueries="true">
        <analyzer type="index">
                <charFilter class="solr.MappingCharFilterFactory" 
mapping="mapping-ISOLatin1Accent.txt" />                              
                <tokenizer class="solr.WhitespaceTokenizerFactory" />
                <filter class="solr.SynonymFilterFactory" 
synonyms="synonyms.txt" ignoreCase="true" expand="true" 
tokenizerFactory="solr.KeywordTokenizerFactory" />                            
                <filter class="solr.WordDelimiterFilterFactory" 
generateWordParts="1" generateNumberParts="1" catenateWords="1" 
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.KeywordMarkerFilterFactory" 
protected="protwords.txt" />
                <filter class="solr.PorterStemFilterFactory" />
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
        </analyzer>
        <analyzer type="query">
                <charFilter class="solr.MappingCharFilterFactory" 
mapping="mapping-ISOLatin1Accent.txt" />
                <tokenizer class="solr.WhitespaceTokenizerFactory" />
                <filter class="solr.WordDelimiterFilterFactory" 
generateWordParts="1" generateNumberParts="1" catenateWords="0" 
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" />
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.KeywordMarkerFilterFactory" 
protected="protwords.txt" />
                <filter class="solr.PorterStemFilterFactory" />
        </analyzer>
</fieldType>

Multi word synonyms

Reply via email to