Re: How to remove stemming from the analyzer - Finding "blah" when searching for "blah*"

Erik Hatcher Thu, 12 Mar 2009 05:32:51 -0700

Remove the EnglishPorterFilterFactory from your "text" analyzerconfiguration (both index and query sides). And reindex all documents.


        Erik


On Mar 12, 2009, at 8:28 AM, Bruno Aranda wrote:

Hi,

I am trying to disable stemming from the analyzer, but I am not surehow to

do it.

For instance, I have a field that contains "blah", but when I searchfor"blah*" it cannot find it, whereas if I search for "bla*" it does. Iwasusing the text type field, from the example schema.xml. How should Imodifyit so that stemming is not done and I can find "blah" when I searchfor

"blah*"?

<fieldType name="text" class="solr.TextField"positionIncrementGap="100">

     <analyzer type="index">
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       <!-- in this example, we will only use synonyms at query time
       <filter class="solr.SynonymFilterFactory"
synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
       -->
       <!-- Case insensitive stop word removal.
         add enablePositionIncrements=true in both the index and query
         analyzers to leave a 'gap' for more accurate phrase queries.
       -->
       <filter class="solr.StopFilterFactory"
               ignoreCase="true"
               words="stopwords.txt"
               enablePositionIncrements="true"
               />
       <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
     </analyzer>
     <analyzer type="query">
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>

<filter class="solr.SynonymFilterFactory"synonyms="synonyms.txt"

ignoreCase="true" expand="true"/>
       <filter class="solr.StopFilterFactory"
               ignoreCase="true"
               words="stopwords.txt"
               enablePositionIncrements="true"
               />
       <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
     </analyzer>
   </fieldType>

I have tried using the "textTight" type to no avail. Most of thefields in

my documents have this structure:

DOC1 field> gene name:brca2
DOC2 field> gene name:brca23

If I searched for "brca2*" I would like to find both documents. Myfield

values normally contain colons ':' that should be used as stop words.

Thank you in advance,

Bruno

Re: How to remove stemming from the analyzer - Finding "blah" when searching for "blah*"

Reply via email to