How to figure out whether stopwords are being indexed or not

Pratik Patel Tue, 21 Feb 2017 14:53:16 -0800

I have a field type in schema which has been applied stopwords list.
I have verified that path of stopwords file is correct and it is being
loaded fine in solr admin UI. When I analyse these fields using "Analysis" tab
of the solr admin UI, I can see that stopwords are being filtered out.
However, when I query with some of these stopwords, I do get the results
back which makes me think that probably stopwords are being indexed.


For example, when I run following query, I do get back results. I have word
"and" in the stopwords list so I expect no results for this query.

http://localhost:8081/solr/collection1/select?fq=Description_note:*%20and%20*&indent=on&q=*:*&rows=100&start=0&wt=json

Does this mean that the "and" word is being indexed and stopwords are not
being used?

Following is the field type of field Description_note :


<fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100" omitNorms="true">
      <analyzer type="index">
      <charFilter class="solr.HTMLStripCharFilterFactory" />
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="((?m)[a-z]+)'s" replacement="$1s" />
<filter class="solr.WordDelimiterFilterFactory" protected="protwords.txt"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.KStemFilterFactory" />
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
      </analyzer>
      <analyzer type="query">
      <charFilter class="solr.HTMLStripCharFilterFactory" />
        <tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="((?m)[a-z]+)'s" replacement="$1s" />
<filter class="solr.WordDelimiterFilterFactory" protected="protwords.txt"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.KStemFilterFactory" />
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" />
      </analyzer>
    </fieldType>

How to figure out whether stopwords are being indexed or not

Reply via email to