When search term has two stopwords ('and' and 'a') together, it doesn't work

Guilherme Viteri Tue, 05 Nov 2019 06:14:20 -0800

Hi,

I am performing a search to match a name (text_field), however this term 
contains 'and' and 'a' and it doesn't return any records. If i remove 'a' then 
it works.
e.g
Search Term: lymphoid and a non-lymphoid cell
doesn't work: 
https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
 
<https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true>


Search term: lymphoid and non-lymphoid cell
works: 
https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
 
<https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true>
interested in the first result

schema.xml
 <field name="name"                          type="text_field"       
indexed="true"  stored="true"   omitNorms="false"   required="true"     
multiValued="false"/>

            <analyzer type="query">
                <tokenizer class="solr.PatternTokenizerFactory"  
pattern="[^a-zA-Z0-9/._:]"/>
                <filter class="solr.PatternReplaceFilterFactory" 
pattern="^[/._:]+" replacement=""/>
                <filter class="solr.PatternReplaceFilterFactory" 
pattern="[/._:]+$" replacement=""/>
                <filter class="solr.PatternReplaceFilterFactory" pattern="[_]" 
replacement=" "/>
                <filter class="solr.LengthFilterFactory" min="2" max="20"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt"/>
            </analyzer>

        <fieldType name="text_field" class="solr.TextField" 
positionIncrementGap="100" omitNorms="false" >
            <analyzer type="index">
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.ClassicFilterFactory"/>
                <filter class="solr.LengthFilterFactory" min="2" max="20"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt"/>
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.PatternTokenizerFactory"  
pattern="[^a-zA-Z0-9/._:]"/>
                <filter class="solr.PatternReplaceFilterFactory" 
pattern="^[/._:]+" replacement=""/>
                <filter class="solr.PatternReplaceFilterFactory" 
pattern="[/._:]+$" replacement=""/>
                <filter class="solr.PatternReplaceFilterFactory" pattern="[_]" 
replacement=" "/>
                <filter class="solr.LengthFilterFactory" min="2" max="20"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt"/>
            </analyzer>
        </fieldType>

stopwords.txt
#Standard english stop words taken from Lucene's StopAnalyzer
a
b
c
....
an
and
are

Running SolR 6.6.2.

Is there anything I could do to prevent this ?

Thanks 
Guilherme

When search term has two stopwords ('and' and 'a') together, it doesn't work

Reply via email to