Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

Guilherme Viteri Fri, 08 Nov 2019 08:30:13 -0800

HI Walter and Paras

I indexed it removing all the references to StopWordFilter and I went from 121 
results to near 20K as the search term q="Lymphoid and a non-Lymphoid cell" is 
matching entities such as "IFT A" or  "Lamin A". So I don't think removing it 
completely is the way to go from the scenario we have, but I appreciate the 
suggestion...


Yes the response is using fl=*
I am trying some combinations at the moment, but yet no success.

defType=edismax
q.alt=Lymphoid and a non-Lymphoid cell
Number of results=1599
Quite a considerable increase, even though reasonable meaningful results. 

I am sorry but I didn't understand what do you want me to do exactly with the 
lst (??) and qf and bf.

Thanks everyone with their inputs


> On 8 Nov 2019, at 06:45, Paras Lehana <paras.leh...@indiamart.com> wrote:
> 
> Hi Guilherme
> 
> By accident, I ended up querying the using the default handler (/select) and 
> it worked. 
> 
> You've just found the culprit. Thanks for giving the material I requested. 
> Your analysis chain is working as expected. I don't see any issue in either 
> StopWordFilter or your boosts. I also use a boost of 50 when boosting 
> contextual suggestions (boosting "gold iphone" on a page of iphone) but I 
> take Walter's suggestion and would try to optimize my weights. I agree that 
> this 50 thing was not researched much about by us as well (we never faced 
> performance or relevance issues).  
> 
> See the major difference in both the handlers - edismax. I'm pretty sure that 
> your problem lies in the parsing of queries (you can confirm that from 
> parsedquery key in debug of both JSON responses). I hope you have provided 
> the response with fl=*. Replace q with q.alt in your /search handler query 
> and I think you should start getting responses. That's because q.alt uses 
> standard parser. If you want to keep using edisMax, I suggest you to test the 
> responses removing some combination of lst (qf, bf) and find what's 
> restricting the documents to come up. I'm out of office today - would have 
> certainly tried analyzing the field values of the document in /select request 
> and compare it with qf/bq in solrconfig.xml /search. Do this for me and you'd 
> certainly find something.  
> 
> On Thu, 7 Nov 2019 at 21:00, Walter Underwood <wun...@wunderwood.org 
> <mailto:wun...@wunderwood.org>> wrote:
> I normally use a weight of 8 for the most important field, like title. Other 
> fields might get a 4 or 2.
> 
> I add a “pf” field with the weights doubled, so that phrase matches have a 
> higher weight.
> 
> The weight of 8 comes from experience at Infoseek and Inktomi, two early web 
> search engines. With different relevance algorithms and totally different 
> evaluation and tuning systems, they settled on weights of 8 and 7.5 for HTML 
> titles. With the the two radically different system getting the same number, 
> I decided that was a property of the documents, not of the search engines.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org <mailto:wun...@wunderwood.org>
> http://observer.wunderwood.org/ <http://observer.wunderwood.org/>  (my blog)
> 
>> On Nov 7, 2019, at 9:03 AM, Guilherme Viteri <gvit...@ebi.ac.uk 
>> <mailto:gvit...@ebi.ac.uk>> wrote:
>> 
>> Hi Wunder,
>> 
>> My indexer takes quite a few hours to be executed I am shortening it to run 
>> faster, but I also need to make sure it gives what we are expecting. This 
>> implementation's been there for >4y, and massively used.
>> 
>>> In your edismax handlers, weights of 20, 50, and 100 are extremely high. I 
>>> don’t think I’ve ever used a weight higher than 16 in a dozen years of 
>>> configuring Solr.
>> I've inherited that implementation and I am really keen to adequate it, what 
>> would you recommend ?
>> 
>> Cheers
>> Guilherme
>> 
>>> On 7 Nov 2019, at 14:43, Walter Underwood <wun...@wunderwood.org 
>>> <mailto:wun...@wunderwood.org>> wrote:
>>> 
>>> Thanks for posting the files. Looking at schema.xml, I see that you still 
>>> are using StopFilterFactory. The first advice we gave you was to remove 
>>> that.
>>> 
>>> Remove StopFilterFactory everywhere and reindex.
>>> 
>>> You will continue to have problems matching stopwords until you do that.
>>> 
>>> In your edismax handlers, weights of 20, 50, and 100 are extremely high. I 
>>> don’t think I’ve ever used a weight higher than 16 in a dozen years of 
>>> configuring Solr.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org <mailto:wun...@wunderwood.org>
>>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/>  (my blog)
>>> 
>>>> On Nov 7, 2019, at 6:56 AM, Guilherme Viteri <gvit...@ebi.ac.uk 
>>>> <mailto:gvit...@ebi.ac.uk>> wrote:
>>>> 
>>>> Hi Paras, everyone
>>>> 
>>>> Thank you again for your inputs and suggestions. I sorry to hear you had 
>>>> trouble with the attachments I will host it somewhere and share the links. 
>>>> I don't tweak my index, I get the data from the graph database, create a 
>>>> document as they are and save to solr.
>>>> 
>>>> So, I am sending the new analysis screen querying the way you suggested. 
>>>> Also the results with params and solr query url.
>>>> 
>>>> During the process of querying what you asked I found something really 
>>>> weird (at least for me). By accident, I ended up querying the using the 
>>>> default handler (/select) and it worked. Then If I use the one I must use, 
>>>> then sadly doesn't work. I am posting both results and I will also post 
>>>> the handlers as well.
>>>> 
>>>> Here is the link with all the files mentioned before
>>>> https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0 
>>>> <https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0>
>>>>  
>>>> <https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0 
>>>> <https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0>>
>>>> If the link doesn't work www dot dropbox dot com slash sh slash 
>>>> fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a ? dl equals 0
>>>> 
>>>> Thanks
>>>> 
>>>>> On 7 Nov 2019, at 05:23, Paras Lehana <paras.leh...@indiamart.com 
>>>>> <mailto:paras.leh...@indiamart.com>> wrote:
>>>>> 
>>>>> Hi Guilherme.
>>>>> 
>>>>> I am sending they analysis result and the json result as requested.
>>>>> 
>>>>> 
>>>>> Thanks for the effort. Luckily, I can see your attachments (low quality
>>>>> though).
>>>>> 
>>>>> From the analysis screen, the analysis is working as expected. One of the
>>>>> reasons for query="lymphoid and *a* non-lymphoid cell" not matching
>>>>> document containing "Lymphoid and a non-Lymphoid cell" I can initially
>>>>> think of is: the stopword "a" is probably present in post-analysis either
>>>>> of query or index. Did you tweak your index time analysis after indexing?
>>>>> 
>>>>> Do two things:
>>>>> 
>>>>> 1. Post the analysis screen for and index=*"Immunoregulatory
>>>>> interactions between a Lymphoid and a non-Lymphoid cell"* and
>>>>> "query=*"lymphoid
>>>>> and a non-lymphoid cell"*. Try hosting the image and providing the link
>>>>> here.
>>>>> 2. Give the same JSON output as you have sent but this time with
>>>>> *"echoParams=all"*. Also, post the exact Solr query url.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, 6 Nov 2019 at 21:07, Erick Erickson <erickerick...@gmail.com 
>>>>> <mailto:erickerick...@gmail.com>> wrote:
>>>>> 
>>>>>> I don’t see the attachments, maybe I deleted old e-mails or some such. 
>>>>>> The
>>>>>> Apache server is fairly aggressive about stripping attachments though, so
>>>>>> it’s also possible they didn’t make it through.
>>>>>> 
>>>>>>> On Nov 6, 2019, at 9:28 AM, Guilherme Viteri <gvit...@ebi.ac.uk 
>>>>>>> <mailto:gvit...@ebi.ac.uk>> wrote:
>>>>>>> 
>>>>>>> Thanks Erick.
>>>>>>> 
>>>>>>>> First, your index and analysis chains are considerably different, this
>>>>>> can easily be a source of problems. In particular, using two different
>>>>>> tokenizers is a huge red flag. I _strongly_ recommend against this unless
>>>>>> you’re totally sure you understand the consequences. Additionally, your 
>>>>>> use
>>>>>> of the length filter is suspicious, especially since your problem 
>>>>>> statement
>>>>>> is about the addition of a single letter term and the min length allowed 
>>>>>> on
>>>>>> that filter is 2. That said, it’s reasonable to suppose that the ’a’ is
>>>>>> filtered out in both cases, but maybe you’ve found something odd about 
>>>>>> the
>>>>>> interactions.
>>>>>>> I will investigate the min length and post the results later.
>>>>>>> 
>>>>>>>> Second, I have no idea what this will do. Are the equal signs typos?
>>>>>> Used by custom code?
>>>>>>> This the url in my application, not solr params. That's the query 
>>>>>>> string.
>>>>>>> 
>>>>>>>> What does “species=“ do? That’s not Solr syntax, so it’s likely that
>>>>>> all the params with an equal-sign are totally ignored unless it’s just a
>>>>>> typo.
>>>>>>> This is part of the application. Species will be used later on in solr
>>>>>> to filter out the result. That's not solr. That my app params.
>>>>>>> 
>>>>>>>> Third, the easiest way to see what’s happening under the covers is to
>>>>>> add “&debug=true” to the query and look at the parsed query. Ignore all 
>>>>>> the
>>>>>> relevance calculations for the nonce, or specify “&debug=query” to skip
>>>>>> that part.
>>>>>>> The two json files i've sent, they are debugQuery=on and the explain tag
>>>>>> is present.
>>>>>>> I will try the searching the way you mentioned.
>>>>>>> 
>>>>>>> Thank for your inputs
>>>>>>> 
>>>>>>> Guilherme
>>>>>>> 
>>>>>>>> On 6 Nov 2019, at 14:14, Erick Erickson <erickerick...@gmail.com 
>>>>>>>> <mailto:erickerick...@gmail.com>>
>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Fwd to another server
>>>>>>>> 
>>>>>>>> First, your index and analysis chains are considerably different, this
>>>>>> can easily be a source of problems. In particular, using two different
>>>>>> tokenizers is a huge red flag. I _strongly_ recommend against this unless
>>>>>> you’re totally sure you understand the consequences. Additionally, your 
>>>>>> use
>>>>>> of the length filter is suspicious, especially since your problem 
>>>>>> statement
>>>>>> is about the addition of a single letter term and the min length allowed 
>>>>>> on
>>>>>> that filter is 2. That said, it’s reasonable to suppose that the ’a’ is
>>>>>> filtered out in both cases, but maybe you’ve found something odd about 
>>>>>> the
>>>>>> interactions.
>>>>>>>> 
>>>>>>>> Second, I have no idea what this will do. Are the equal signs typos?
>>>>>> Used by custom code?
>>>>>>>> 
>>>>>>>>>> 
>>>>>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>>>>>>  
>>>>>> <https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true>
>>>>>>>> 
>>>>>>>> What does “species=“ do? That’s not Solr syntax, so it’s likely that
>>>>>> all the params with an equal-sign are totally ignored unless it’s just a
>>>>>> typo.
>>>>>>>> 
>>>>>>>> Third, the easiest way to see what’s happening under the covers is to
>>>>>> add “&debug=true” to the query and look at the parsed query. Ignore all 
>>>>>> the
>>>>>> relevance calculations for the nonce, or specify “&debug=query” to skip
>>>>>> that part.
>>>>>>>> 
>>>>>>>> 90% + of the time, the question “why didn’t this query do what I
>>>>>> expect” is answered by looking at the “&debug=query” output and the
>>>>>> analysis page in the admin UI. NOTE: for the analysis page be sure to 
>>>>>> look
>>>>>> at _both_ the query and index output. Also, and very important about the
>>>>>> analysis page (and this is confusing) is that this _assumes_ that what 
>>>>>> you
>>>>>> put in the text boxes have made it through the query parser intact and is
>>>>>> analyzed by the field selected. Consider the search "q=field:word1 
>>>>>> word2".
>>>>>> Now you type “word1 word2” into the analysis text box and it looks like
>>>>>> what you expect. That’s misleading because the query is _parsed_ as
>>>>>> "field:word1 default_search_field:word2”. This is where “&debug=query”
>>>>>> helps.
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Erick
>>>>>>>> 
>>>>>>>>> On Nov 6, 2019, at 2:36 AM, Paras Lehana <paras.leh...@indiamart.com 
>>>>>>>>> <mailto:paras.leh...@indiamart.com>>
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi Walter,
>>>>>>>>> 
>>>>>>>>> The solr.StopFilter removes all tokens that are stopwords. Those words
>>>>>> will
>>>>>>>>>> not be in the index, so they can never match a query.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I think the OP's concern is different results when adding a stopword. 
>>>>>>>>> I
>>>>>>>>> think he's using the filter factory correctly - the query chain
>>>>>> includes
>>>>>>>>> the filter as well so it should remove "a" while querying.
>>>>>>>>> 
>>>>>>>>> *@Guilherme*, please post results for both the query, the document in
>>>>>>>>> result you are concerned about and post full result of analysis screen
>>>>>> (for
>>>>>>>>> both query and index).
>>>>>>>>> 
>>>>>>>>> On Tue, 5 Nov 2019 at 21:38, Walter Underwood <wun...@wunderwood.org 
>>>>>>>>> <mailto:wun...@wunderwood.org>>
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> No.
>>>>>>>>>> 
>>>>>>>>>> The solr.StopFilter removes all tokens that are stopwords. Those 
>>>>>>>>>> words
>>>>>>>>>> will not be in the index, so they can never match a query.
>>>>>>>>>> 
>>>>>>>>>> 1. Remove the lines with solr.StopFilter from every analysis chain in
>>>>>>>>>> schema.xml.
>>>>>>>>>> 2. Reload the collection, restart Solr, or whatever to read the new
>>>>>> config.
>>>>>>>>>> 3. Reindex all of the documents.
>>>>>>>>>> 
>>>>>>>>>> When indexed with the new analysis chain, the stopwords will not be
>>>>>>>>>> removed and they will be searchable.
>>>>>>>>>> 
>>>>>>>>>> wunder
>>>>>>>>>> Walter Underwood
>>>>>>>>>> wun...@wunderwood.org <mailto:wun...@wunderwood.org>
>>>>>>>>>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/>  
>>>>>>>>>> (my blog)
>>>>>>>>>> 
>>>>>>>>>>> On Nov 5, 2019, at 8:56 AM, Guilherme Viteri <gvit...@ebi.ac.uk 
>>>>>>>>>>> <mailto:gvit...@ebi.ac.uk>>
>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Ok. I am kind a lost now.
>>>>>>>>>>> If I open up the console > analysis and perform it, that's the final
>>>>>>>>>> result.
>>>>>>>>>>> <Screenshot 2019-11-05 at 14.54.16.png>
>>>>>>>>>>> 
>>>>>>>>>>> Your suggestion is: get rid of the <filter stopword.txt> in the
>>>>>>>>>> schema.xml and during index phase replaceAll("in stopwords.txt"," ")
>>>>>> then
>>>>>>>>>> add to solr. Is that correct ?
>>>>>>>>>>> 
>>>>>>>>>>> Thanks David
>>>>>>>>>>> 
>>>>>>>>>>>> On 5 Nov 2019, at 14:48, David Hastings <
>>>>>> hastings.recurs...@gmail.com <mailto:hastings.recurs...@gmail.com>
>>>>>>>>>> <mailto:hastings.recurs...@gmail.com 
>>>>>>>>>> <mailto:hastings.recurs...@gmail.com>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Fwd to another server
>>>>>>>>>>>> 
>>>>>>>>>>>> no,
>>>>>>>>>>>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>>>>>>>>>> words="stopwords.txt"/>
>>>>>>>>>>>> 
>>>>>>>>>>>> is still using stopwords and should be removed, in my opinion of
>>>>>> course,
>>>>>>>>>>>> based on your use case may be different, but i generally axe any
>>>>>>>>>> reference
>>>>>>>>>>>> to them at all
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, Nov 5, 2019 at 9:47 AM Guilherme Viteri <gvit...@ebi.ac.uk 
>>>>>>>>>>>> <mailto:gvit...@ebi.ac.uk>
>>>>>>>>>> <mailto:gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>> Haven't I done this here ?
>>>>>>>>>>>>> <fieldType name="text_field" class="solr.TextField"
>>>>>>>>>>>>> positionIncrementGap="100" omitNorms="false" >
>>>>>>>>>>>>>    <analyzer type="index">
>>>>>>>>>>>>>        <tokenizer class="solr.StandardTokenizerFactory"/>
>>>>>>>>>>>>>        <filter class="solr.ClassicFilterFactory"/>
>>>>>>>>>>>>>        <filter class="solr.LengthFilterFactory" min="2"
>>>>>>>>>> max="20"/>
>>>>>>>>>>>>>        <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>>>>>>>        <filter class="solr.StopFilterFactory" ignoreCase="true"
>>>>>>>>>>>>> words="stopwords.txt"/>
>>>>>>>>>>>>>    </analyzer>
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 5 Nov 2019, at 14:15, David Hastings <
>>>>>> hastings.recurs...@gmail.com <mailto:hastings.recurs...@gmail.com>
>>>>>>>>>> <mailto:hastings.recurs...@gmail.com 
>>>>>>>>>> <mailto:hastings.recurs...@gmail.com>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Fwd to another server
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The first thing you should do is remove any reference to stop
>>>>>> words
>>>>>>>>>> and
>>>>>>>>>>>>>> never use them, then re-index your data and try it again.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, Nov 5, 2019 at 9:14 AM Guilherme Viteri <
>>>>>> gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>
>>>>>>>>>> <mailto:gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I am performing a search to match a name (text_field), however
>>>>>> this
>>>>>>>>>> term
>>>>>>>>>>>>>>> contains 'and' and 'a' and it doesn't return any records. If i
>>>>>> remove
>>>>>>>>>>>>> 'a'
>>>>>>>>>>>>>>> then it works.
>>>>>>>>>>>>>>> e.g
>>>>>>>>>>>>>>> Search Term: lymphoid and a non-lymphoid cell
>>>>>>>>>>>>>>> doesn't work:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>>>>>>  
>>>>>> <https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true>
>>>>>>>>>> <
>>>>>>>>>> 
>>>>>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>>>>>>  
>>>>>> <https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true>
>>>>>>>>>>> 
>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>>>>>>  
>>>>>> <https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true>
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Search term: lymphoid and non-lymphoid cell
>>>>>>>>>>>>>>> works:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>>>>>>  
>>>>>> <https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true>
>>>>>>>>>>>>>>> <
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true
>>>>>>  
>>>>>> <https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true>
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> interested in the first result
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> schema.xml
>>>>>>>>>>>>>>> <field name="name"                          type="text_field"
>>>>>>>>>>>>>>> indexed="true"  stored="true"   omitNorms="false"
>>>>>> required="true"
>>>>>>>>>>>>>>> multiValued="false"/>
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>    <analyzer type="query">
>>>>>>>>>>>>>>>        <tokenizer class="solr.PatternTokenizerFactory"
>>>>>>>>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/>
>>>>>>>>>>>>>>>        <filter class="solr.PatternReplaceFilterFactory"
>>>>>>>>>>>>>>> pattern="^[/._:]+" replacement=""/>
>>>>>>>>>>>>>>>        <filter class="solr.PatternReplaceFilterFactory"
>>>>>>>>>>>>>>> pattern="[/._:]+$" replacement=""/>
>>>>>>>>>>>>>>>        <filter class="solr.PatternReplaceFilterFactory"
>>>>>>>>>>>>>>> pattern="[_]" replacement=" "/>
>>>>>>>>>>>>>>>        <filter class="solr.LengthFilterFactory" min="2"
>>>>>>>>>>>>> max="20"/>
>>>>>>>>>>>>>>>        <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>>>>>>>>>        <filter class="solr.StopFilterFactory"
>>>>>>>>>> ignoreCase="true"
>>>>>>>>>>>>>>> words="stopwords.txt"/>
>>>>>>>>>>>>>>>    </analyzer>
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> <fieldType name="text_field" class="solr.TextField"
>>>>>>>>>>>>>>> positionIncrementGap="100" omitNorms="false" >
>>>>>>>>>>>>>>>    <analyzer type="index">
>>>>>>>>>>>>>>>        <tokenizer class="solr.StandardTokenizerFactory"/>
>>>>>>>>>>>>>>>        <filter class="solr.ClassicFilterFactory"/>
>>>>>>>>>>>>>>>        <filter class="solr.LengthFilterFactory" min="2"
>>>>>>>>>>>>> max="20"/>
>>>>>>>>>>>>>>>        <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>>>>>>>>>        <filter class="solr.StopFilterFactory"
>>>>>>>>>> ignoreCase="true"
>>>>>>>>>>>>>>> words="stopwords.txt"/>
>>>>>>>>>>>>>>>    </analyzer>
>>>>>>>>>>>>>>>    <analyzer type="query">
>>>>>>>>>>>>>>>        <tokenizer class="solr.PatternTokenizerFactory"
>>>>>>>>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/>
>>>>>>>>>>>>>>>        <filter class="solr.PatternReplaceFilterFactory"
>>>>>>>>>>>>>>> pattern="^[/._:]+" replacement=""/>
>>>>>>>>>>>>>>>        <filter class="solr.PatternReplaceFilterFactory"
>>>>>>>>>>>>>>> pattern="[/._:]+$" replacement=""/>
>>>>>>>>>>>>>>>        <filter class="solr.PatternReplaceFilterFactory"
>>>>>>>>>>>>>>> pattern="[_]" replacement=" "/>
>>>>>>>>>>>>>>>        <filter class="solr.LengthFilterFactory" min="2"
>>>>>>>>>>>>> max="20"/>
>>>>>>>>>>>>>>>        <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>>>>>>>>>        <filter class="solr.StopFilterFactory"
>>>>>>>>>> ignoreCase="true"
>>>>>>>>>>>>>>> words="stopwords.txt"/>
>>>>>>>>>>>>>>>    </analyzer>
>>>>>>>>>>>>>>> </fieldType>
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> stopwords.txt
>>>>>>>>>>>>>>> #Standard english stop words taken from Lucene's StopAnalyzer
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>> b
>>>>>>>>>>>>>>> c
>>>>>>>>>>>>>>> ....
>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Running SolR 6.6.2.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Is there anything I could do to prevent this ?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>> Guilherme
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> --
>>>>>>>>> Regards,
>>>>>>>>> 
>>>>>>>>> *Paras Lehana* [65871]
>>>>>>>>> Development Engineer, Auto-Suggest,
>>>>>>>>> IndiaMART Intermesh Ltd.
>>>>>>>>> 
>>>>>>>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
>>>>>>>>> Noida, UP, IN - 201303
>>>>>>>>> 
>>>>>>>>> Mob.: +91-9560911996
>>>>>>>>> Work: 01203916600 | Extn:  *8173*
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> IMPORTANT:
>>>>>>>>> NEVER share your IndiaMART OTP/ Password with anyone.
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> -- 
>>>>> -- 
>>>>> Regards,
>>>>> 
>>>>> *Paras Lehana* [65871]
>>>>> Development Engineer, Auto-Suggest,
>>>>> IndiaMART Intermesh Ltd.
>>>>> 
>>>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
>>>>> Noida, UP, IN - 201303
>>>>> 
>>>>> Mob.: +91-9560911996
>>>>> Work: 01203916600 | Extn:  *8173*
>>>>> 
>>>>> -- 
>>>>> IMPORTANT: 
>>>>> NEVER share your IndiaMART OTP/ Password with anyone.
>>>> 
>>> 
>> 
> 
> 
> 
> -- 
> -- 
> Regards,
> 
> Paras Lehana [65871]
> Development Engineer, Auto-Suggest,
> IndiaMART Intermesh Ltd.
> 
> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> Noida, UP, IN - 201303
> 
> Mob.: +91-9560911996 <tel:+91-9560911996>
> Work: 01203916600 | Extn:  8173
> 
> IMPORTANT: 
> NEVER share your IndiaMART OTP/ Password with anyone.

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

Reply via email to