Re: How to remove stemming from the analyzer - Finding "blah" when searching for "blah*"

Erik Hatcher Thu, 12 Mar 2009 06:44:24 -0700

What is the full query you're issuing to Solr and the correspondingrequest handler configuration?

Chances are you're using the dismax query parser, which does notsupport wildcards. Other things to check, be sure you've tied thefield to your new textIntact type, and that you're searching thatfield (see defaultField in schema.xml).


Try something like /solr/select?q=field_name:blah*

        Erik

On Mar 12, 2009, at 9:09 AM, Bruno Aranda wrote:

Thanks for your answer, I am trying now with this custom text field:

<fieldType name="textIntact" class="solr.TextField"
positionIncrementGap="100" >
     <analyzer>
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
       <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="0"
               catenateWords="0" catenateNumbers="0" catenateAll="0"
expand="0" splitOnCaseChange="0"/>
       <filter class="solr.LowerCaseFilterFactory"/>
       <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
     </analyzer>
   </fieldType>

And still it does not find "blah" when using the wildcard andsearching for

"blah*". Am I missing something?

Thanks,

Bruno

2009/3/12 Erik Hatcher <e...@ehatchersolutions.com>

Remove the EnglishPorterFilterFactory from your "text" analyzer

configuration (both index and query sides). And reindex alldocuments.


      Erik


On Mar 12, 2009, at 8:28 AM, Bruno Aranda wrote:

Hi,

I am trying to disable stemming from the analyzer, but I am notsure how

to
do it.

For instance, I have a field that contains "blah", but when Isearch for"blah*" it cannot find it, whereas if I search for "bla*" it does.I was

using the text type field, from the example schema.xml. How should I
modify

it so that stemming is not done and I can find "blah" when Isearch for

"blah*"?

<fieldType name="text" class="solr.TextField"positionIncrementGap="100">

   <analyzer type="index">
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
     <!-- in this example, we will only use synonyms at query time
     <filter class="solr.SynonymFilterFactory"
synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
     -->
     <!-- Case insensitive stop word removal.
       add enablePositionIncrements=true in both the index and query
       analyzers to leave a 'gap' for more accurate phrase queries.
     -->
     <filter class="solr.StopFilterFactory"
             ignoreCase="true"
             words="stopwords.txt"
             enablePositionIncrements="true"
             />
     <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
     <filter class="solr.LowerCaseFilterFactory"/>
     <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
     <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
   </analyzer>
   <analyzer type="query">
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>

<filter class="solr.SynonymFilterFactory"synonyms="synonyms.txt"

ignoreCase="true" expand="true"/>
     <filter class="solr.StopFilterFactory"
             ignoreCase="true"
             words="stopwords.txt"
             enablePositionIncrements="true"
             />
     <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
     <filter class="solr.LowerCaseFilterFactory"/>
     <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
     <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
   </analyzer>
 </fieldType>

I have tried using the "textTight" type to no avail. Most of thefields in

my documents have this structure:

DOC1 field> gene name:brca2
DOC2 field> gene name:brca23

If I searched for "brca2*" I would like to find both documents. Myfieldvalues normally contain colons ':' that should be used as stopwords.


Thank you in advance,

Bruno

Re: How to remove stemming from the analyzer - Finding "blah" when searching for "blah*"

Reply via email to