Hi,

  I'm having an issue running phrase quires with stopwords. Looks like Solr
is ignoring the stopword during search. Here's my search term.

"cannot open device"

When I'm executing title:"cannot open device" , it's bringing back titles
with "Find Open Devices".  Here's my field definition for title :

<field name="title" type="adsktext" indexed="true" stored="true"
multiValued="true"/>

<fieldType name="adsktext" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>

Sample text :

<doc>
<field name="id">111!SOLR1000</field>
<field name="name">Solr, the Enterprise Search Server</field>
<field name="title">Find Open Devices</field>
</doc>
<doc>
<field name="id">333!SOLR1002</field>
<field name="name">ElasticSearch Server</field>
<field name="title">Cannot open device</field>
</doc>

I've "cannot" as part of my stopword list.

Weird part is, when I analyze the phrase in Solr admin, it's getting
indexed as the following three tokens :

cannot open devic

I'm in Solr 4.7, so not sure if enablePositionIncrements="true" is making
any difference.

Any feedback will be appreciated.

Thanks,
Shamik

Reply via email to