Remove the EnglishPorterFilterFactory from your "text" analyzer
configuration (both index and query sides). And reindex all documents.
Erik
On Mar 12, 2009, at 8:28 AM, Bruno Aranda wrote:
Hi,
I am trying to disable stemming from the analyzer, but I am not sure
how to
do it.
For instance, I have a field that contains "blah", but when I search
for
"blah*" it cannot find it, whereas if I search for "bla*" it does. I
was
using the text type field, from the example schema.xml. How should I
modify
it so that stemming is not done and I can find "blah" when I search
for
"blah*"?
<fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory"
synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<!-- Case insensitive stop word removal.
add enablePositionIncrements=true in both the index and query
analyzers to leave a 'gap' for more accurate phrase queries.
-->
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
I have tried using the "textTight" type to no avail. Most of the
fields in
my documents have this structure:
DOC1 field> gene name:brca2
DOC2 field> gene name:brca23
If I searched for "brca2*" I would like to find both documents. My
field
values normally contain colons ':' that should be used as stop words.
Thank you in advance,
Bruno