The first thing you should do is remove any reference to stop words and
never use them, then re-index your data and try it again.

On Tue, Nov 5, 2019 at 9:14 AM Guilherme Viteri <gvit...@ebi.ac.uk> wrote:

> Hi,
>
> I am performing a search to match a name (text_field), however this term
> contains 'and' and 'a' and it doesn't return any records. If i remove 'a'
> then it works.
> e.g
> Search Term: lymphoid and a non-lymphoid cell
> doesn't work:
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> <
> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> >
>
> Search term: lymphoid and non-lymphoid cell
> works:
> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> <
> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
> >
> interested in the first result
>
> schema.xml
>  <field name="name"                          type="text_field"
>  indexed="true"  stored="true"   omitNorms="false"   required="true"
>  multiValued="false"/>
>
>             <analyzer type="query">
>                 <tokenizer class="solr.PatternTokenizerFactory"
> pattern="[^a-zA-Z0-9/._:]"/>
>                 <filter class="solr.PatternReplaceFilterFactory"
> pattern="^[/._:]+" replacement=""/>
>                 <filter class="solr.PatternReplaceFilterFactory"
> pattern="[/._:]+$" replacement=""/>
>                 <filter class="solr.PatternReplaceFilterFactory"
> pattern="[_]" replacement=" "/>
>                 <filter class="solr.LengthFilterFactory" min="2" max="20"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>                 <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>             </analyzer>
>
>         <fieldType name="text_field" class="solr.TextField"
> positionIncrementGap="100" omitNorms="false" >
>             <analyzer type="index">
>                 <tokenizer class="solr.StandardTokenizerFactory"/>
>                 <filter class="solr.ClassicFilterFactory"/>
>                 <filter class="solr.LengthFilterFactory" min="2" max="20"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>                 <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>             </analyzer>
>             <analyzer type="query">
>                 <tokenizer class="solr.PatternTokenizerFactory"
> pattern="[^a-zA-Z0-9/._:]"/>
>                 <filter class="solr.PatternReplaceFilterFactory"
> pattern="^[/._:]+" replacement=""/>
>                 <filter class="solr.PatternReplaceFilterFactory"
> pattern="[/._:]+$" replacement=""/>
>                 <filter class="solr.PatternReplaceFilterFactory"
> pattern="[_]" replacement=" "/>
>                 <filter class="solr.LengthFilterFactory" min="2" max="20"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>                 <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>             </analyzer>
>         </fieldType>
>
> stopwords.txt
> #Standard english stop words taken from Lucene's StopAnalyzer
> a
> b
> c
> ....
> an
> and
> are
>
> Running SolR 6.6.2.
>
> Is there anything I could do to prevent this ?
>
> Thanks
> Guilherme

Reply via email to