Re: Problem with words thats amost similar

Steinar Asbjørnsen Thu, 17 Dec 2009 04:41:40 -0800

Den 17. des. 2009 kl. 12.42 skrev Shalin Shekhar Mangar:

> 2009/12/17 Steinar Asbjørnsen <steinar...@gmail.com>
> 
>> Hi all.
>> 
>> I have a delicate problem when it comes to two words that are rather
>> similar in the way they are typed, but when it comes to the meaning of the
>> word they are completely different.
>> The actual words are restaurant (as in restaurant) and restaurering (as in
>> restoration).
>> 
>> Solr seems to think these words are similar enough to present hits on both
>> of them in the same search result.
>> Obviously this is not desirable.
>> 
>> Is there a way to take care of such spesific cases without disabling solr
>> functionality for stemming and/or plurals?
>> Or would I need to disable stemming to make this special case disapear?
>> 
>> 
> For specific cases like this, you can add the word to a file and specify it
> in schema, for example:
> 
> <filter class="solr.SnowballPorterFilterFactory" language="English"
> protected="protwords.txt"/>


Ty Shalin.

This is my schema.xml file
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt" enablePositionIncrements="true"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" 
splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" 
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" 
generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" 
splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" 
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>

I added restaurant and restaurering to protwords.txt, restarted Tomcat, but no 
dice.
Do I need to use the SnowballPorterFilterFactory?
And do I need to reindex the documents?

Steinar

Re: Problem with words thats amost similar

Reply via email to