Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

Guilherme Viteri Tue, 05 Nov 2019 06:48:25 -0800

Thanks.
Haven't I done this here ?
  <fieldType name="text_field" class="solr.TextField" 
positionIncrementGap="100" omitNorms="false" >
           <analyzer type="index">
               <tokenizer class="solr.StandardTokenizerFactory"/>
               <filter class="solr.ClassicFilterFactory"/>
               <filter class="solr.LengthFilterFactory" min="2" max="20"/>
               <filter class="solr.LowerCaseFilterFactory"/>
               <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt"/>
           </analyzer>



> On 5 Nov 2019, at 14:15, David Hastings <hastings.recurs...@gmail.com> wrote:
> 
> Fwd to another server
> 
> The first thing you should do is remove any reference to stop words and
> never use them, then re-index your data and try it again.
> 
> On Tue, Nov 5, 2019 at 9:14 AM Guilherme Viteri <gvit...@ebi.ac.uk> wrote:
> 
>> Hi,
>> 
>> I am performing a search to match a name (text_field), however this term
>> contains 'and' and 'a' and it doesn't return any records. If i remove 'a'
>> then it works.
>> e.g
>> Search Term: lymphoid and a non-lymphoid cell
>> doesn't work:
>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>> <
>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>>> 
>> 
>> Search term: lymphoid and non-lymphoid cell
>> works:
>> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>> <
>> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>>> 
>> interested in the first result
>> 
>> schema.xml
>> <field name="name"                          type="text_field"
>> indexed="true"  stored="true"   omitNorms="false"   required="true"
>> multiValued="false"/>
>> 
>>            <analyzer type="query">
>>                <tokenizer class="solr.PatternTokenizerFactory"
>> pattern="[^a-zA-Z0-9/._:]"/>
>>                <filter class="solr.PatternReplaceFilterFactory"
>> pattern="^[/._:]+" replacement=""/>
>>                <filter class="solr.PatternReplaceFilterFactory"
>> pattern="[/._:]+$" replacement=""/>
>>                <filter class="solr.PatternReplaceFilterFactory"
>> pattern="[_]" replacement=" "/>
>>                <filter class="solr.LengthFilterFactory" min="2" max="20"/>
>>                <filter class="solr.LowerCaseFilterFactory"/>
>>                <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt"/>
>>            </analyzer>
>> 
>>        <fieldType name="text_field" class="solr.TextField"
>> positionIncrementGap="100" omitNorms="false" >
>>            <analyzer type="index">
>>                <tokenizer class="solr.StandardTokenizerFactory"/>
>>                <filter class="solr.ClassicFilterFactory"/>
>>                <filter class="solr.LengthFilterFactory" min="2" max="20"/>
>>                <filter class="solr.LowerCaseFilterFactory"/>
>>                <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt"/>
>>            </analyzer>
>>            <analyzer type="query">
>>                <tokenizer class="solr.PatternTokenizerFactory"
>> pattern="[^a-zA-Z0-9/._:]"/>
>>                <filter class="solr.PatternReplaceFilterFactory"
>> pattern="^[/._:]+" replacement=""/>
>>                <filter class="solr.PatternReplaceFilterFactory"
>> pattern="[/._:]+$" replacement=""/>
>>                <filter class="solr.PatternReplaceFilterFactory"
>> pattern="[_]" replacement=" "/>
>>                <filter class="solr.LengthFilterFactory" min="2" max="20"/>
>>                <filter class="solr.LowerCaseFilterFactory"/>
>>                <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt"/>
>>            </analyzer>
>>        </fieldType>
>> 
>> stopwords.txt
>> #Standard english stop words taken from Lucene's StopAnalyzer
>> a
>> b
>> c
>> ....
>> an
>> and
>> are
>> 
>> Running SolR 6.6.2.
>> 
>> Is there anything I could do to prevent this ?
>> 
>> Thanks
>> Guilherme

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

Reply via email to