Re: How to remove stemming from the analyzer - Finding "blah" when searching for "blah*"

Bruno Aranda Thu, 12 Mar 2009 06:09:55 -0700

Thanks for your answer, I am trying now with this custom text field:

<fieldType name="textIntact" class="solr.TextField"
positionIncrementGap="100" >
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="0"
                catenateWords="0" catenateNumbers="0" catenateAll="0"
expand="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>


And still it does not find "blah" when using the wildcard and searching for
"blah*". Am I missing something?

Thanks,

Bruno

2009/3/12 Erik Hatcher <e...@ehatchersolutions.com>

> Remove the EnglishPorterFilterFactory from your "text" analyzer
> configuration (both index and query sides).  And reindex all documents.
>
>        Erik
>
>
> On Mar 12, 2009, at 8:28 AM, Bruno Aranda wrote:
>
>  Hi,
>>
>> I am trying to disable stemming from the analyzer, but I am not sure how
>> to
>> do it.
>>
>> For instance, I have a field that contains "blah", but when I search for
>> "blah*" it cannot find it, whereas if I search for "bla*" it does. I was
>> using the text type field, from the example schema.xml. How should I
>> modify
>> it so that stemming is not done and I can find "blah" when I search for
>> "blah*"?
>>
>> <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>>     <analyzer type="index">
>>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>       <!-- in this example, we will only use synonyms at query time
>>       <filter class="solr.SynonymFilterFactory"
>> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>>       -->
>>       <!-- Case insensitive stop word removal.
>>         add enablePositionIncrements=true in both the index and query
>>         analyzers to leave a 'gap' for more accurate phrase queries.
>>       -->
>>       <filter class="solr.StopFilterFactory"
>>               ignoreCase="true"
>>               words="stopwords.txt"
>>               enablePositionIncrements="true"
>>               />
>>       <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>>       <filter class="solr.LowerCaseFilterFactory"/>
>>       <filter class="solr.EnglishPorterFilterFactory"
>> protected="protwords.txt"/>
>>       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>     </analyzer>
>>     <analyzer type="query">
>>       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>       <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>> ignoreCase="true" expand="true"/>
>>       <filter class="solr.StopFilterFactory"
>>               ignoreCase="true"
>>               words="stopwords.txt"
>>               enablePositionIncrements="true"
>>               />
>>       <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>>       <filter class="solr.LowerCaseFilterFactory"/>
>>       <filter class="solr.EnglishPorterFilterFactory"
>> protected="protwords.txt"/>
>>       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>     </analyzer>
>>   </fieldType>
>>
>> I have tried using the "textTight" type to no avail. Most of the fields in
>> my documents have this structure:
>>
>> DOC1 field> gene name:brca2
>> DOC2 field> gene name:brca23
>>
>> If I searched for "brca2*" I would like to find both documents. My field
>> values normally contain colons ':' that should be used as stop words.
>>
>> Thank you in advance,
>>
>> Bruno
>>
>
>

Re: How to remove stemming from the analyzer - Finding "blah" when searching for "blah*"

Reply via email to