Hi So I don't think removing it completely is the way to go from the scenario > we have
Removing stopwords is another story. I'm curious to find the reason assuming that you keep on using stopwords. In some cases, stopwords are really necessary. Quite a considerable increase If q.alt is giving you responses, it's confirmed that your stopwords filter is working as expected. The problem definitely lies in the configuration of edismax. > I am sorry but I didn't understand what do you want me to do exactly with > the lst (??) and qf and bf. What combinations did you try? I was referring to the field-level boosting you have applied in edismax config. *Let me explain again:* In your solrconfig.xml, look at your /search request handler. There are many qf and some bq boosts. I want you to remove all of these, check response again (with q now) and keep on adding them again (one by one) while looking for when the numFound drastically changes. On Fri, 8 Nov 2019 at 23:47, David Hastings <hastings.recurs...@gmail.com> wrote: > I use 3 word shingles with stopwords for my MLT ML trainer that worked > pretty well for such a solution, but for a full index the size became > prohibitive > > On Fri, Nov 8, 2019 at 12:13 PM Walter Underwood <wun...@wunderwood.org> > wrote: > > > If we had IDF for phrases, they would be super effective. The 2X weight > is > > a hack that mostly works. > > > > Infoseek had phrase IDF and it was a killer algorithm for relevance. > > > > wunder > > Walter Underwood > > wun...@wunderwood.org > > http://observer.wunderwood.org/ (my blog) > > > > > On Nov 8, 2019, at 11:08 AM, David Hastings < > > hastings.recurs...@gmail.com> wrote: > > > > > > the pf and qf fields are REALLY nice for this > > > > > > On Fri, Nov 8, 2019 at 12:02 PM Walter Underwood < > wun...@wunderwood.org> > > > wrote: > > > > > >> I always enable phrase searching in edismax for exactly this reason. > > >> > > >> Something like: > > >> > > >> <str name="qf”>title^8 keywords^4 text</str> > > >> <str name="pf”>title^16 keywords^8 text^2</str> > > >> > > >> To deal with concepts in queries, a classifier and/or named entity > > >> extractor can be helpful. If you have a list of concepts (“controlled > > >> vocabulary”) that includes “Lamin A”, and that shows up in a query, > that > > >> term can be queried against the field matching that vocabulary. > > >> > > >> This is how LinkedIn separates people, companies, and places, for > > example. > > >> > > >> wunder > > >> Walter Underwood > > >> wun...@wunderwood.org > > >> http://observer.wunderwood.org/ (my blog) > > >> > > >>> On Nov 8, 2019, at 10:48 AM, Erick Erickson <erickerick...@gmail.com > > > > >> wrote: > > >>> > > >>> Look at the “mm” parameter, try setting it to 100%. Although that’t > not > > >> entirely likely to do what you want either since virtually every doc > > will > > >> have “a” in it. But at least you’d get docs that have both terms. > > >>> > > >>> you may also be able to search for things like “Lamin A” _only as a > > >> phrase_ and have some luck. But this is a gnarly problem in general. > > Some > > >> people have been able to substitute synonyms and/or shingles to make > > this > > >> work at the expense of a larger index. > > >>> > > >>> This is a generic problem with context. “Lamin A” is really a > > “concept”, > > >> not just two words that happen to be near each other. Searching as a > > phrase > > >> is an OOB-but-naive way to try to make it more likely that the ranked > > >> results refer to the _concept_ of “Lamin A”. The assumption here is > “if > > >> these two words appear next to each other, they’re more likely to be > > what I > > >> want”. I say “naive” because “Lamins: A new approach to...” would > > _also_ be > > >> found for a naive phrase search. (I have no idea whether such a title > > makes > > >> sense or not, but you figured that out already)... > > >>> > > >>> To do this well you’d have to dive in to NLP/Machine learning. > > >>> > > >>> I truly wish we could have the DWIM search algorithm (Do What I > Mean)…. > > >>> > > >>>> On Nov 8, 2019, at 11:29 AM, Guilherme Viteri <gvit...@ebi.ac.uk> > > >> wrote: > > >>>> > > >>>> HI Walter and Paras > > >>>> > > >>>> I indexed it removing all the references to StopWordFilter and I > went > > >> from 121 results to near 20K as the search term q="Lymphoid and a > > >> non-Lymphoid cell" is matching entities such as "IFT A" or "Lamin A". > > So I > > >> don't think removing it completely is the way to go from the scenario > we > > >> have, but I appreciate the suggestion… > > >>>> > > >>>> Yes the response is using fl=* > > >>>> I am trying some combinations at the moment, but yet no success. > > >>>> > > >>>> defType=edismax > > >>>> q.alt=Lymphoid and a non-Lymphoid cell > > >>>> Number of results=1599 > > >>>> Quite a considerable increase, even though reasonable meaningful > > >> results. > > >>>> > > >>>> I am sorry but I didn't understand what do you want me to do exactly > > >> with the lst (??) and qf and bf. > > >>>> > > >>>> Thanks everyone with their inputs > > >>>> > > >>>> > > >>>>> On 8 Nov 2019, at 06:45, Paras Lehana <paras.leh...@indiamart.com> > > >> wrote: > > >>>>> > > >>>>> Hi Guilherme > > >>>>> > > >>>>> By accident, I ended up querying the using the default handler > > >> (/select) and it worked. > > >>>>> > > >>>>> You've just found the culprit. Thanks for giving the material I > > >> requested. Your analysis chain is working as expected. I don't see any > > >> issue in either StopWordFilter or your boosts. I also use a boost of > 50 > > >> when boosting contextual suggestions (boosting "gold iphone" on a page > > of > > >> iphone) but I take Walter's suggestion and would try to optimize my > > >> weights. I agree that this 50 thing was not researched much about by > us > > as > > >> well (we never faced performance or relevance issues). > > >>>>> > > >>>>> See the major difference in both the handlers - edismax. I'm pretty > > >> sure that your problem lies in the parsing of queries (you can confirm > > that > > >> from parsedquery key in debug of both JSON responses). I hope you have > > >> provided the response with fl=*. Replace q with q.alt in your /search > > >> handler query and I think you should start getting responses. That's > > >> because q.alt uses standard parser. If you want to keep using > edisMax, I > > >> suggest you to test the responses removing some combination of lst > (qf, > > bf) > > >> and find what's restricting the documents to come up. I'm out of > office > > >> today - would have certainly tried analyzing the field values of the > > >> document in /select request and compare it with qf/bq in > solrconfig.xml > > >> /search. Do this for me and you'd certainly find something. > > >>>>> > > >>>>> On Thu, 7 Nov 2019 at 21:00, Walter Underwood < > wun...@wunderwood.org > > >> <mailto:wun...@wunderwood.org>> wrote: > > >>>>> I normally use a weight of 8 for the most important field, like > > title. > > >> Other fields might get a 4 or 2. > > >>>>> > > >>>>> I add a “pf” field with the weights doubled, so that phrase matches > > >> have a higher weight. > > >>>>> > > >>>>> The weight of 8 comes from experience at Infoseek and Inktomi, two > > >> early web search engines. With different relevance algorithms and > > totally > > >> different evaluation and tuning systems, they settled on weights of 8 > > and > > >> 7.5 for HTML titles. With the the two radically different system > getting > > >> the same number, I decided that was a property of the documents, not > of > > the > > >> search engines. > > >>>>> > > >>>>> wunder > > >>>>> Walter Underwood > > >>>>> wun...@wunderwood.org <mailto:wun...@wunderwood.org> > > >>>>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/> > > >> (my blog) > > >>>>> > > >>>>>> On Nov 7, 2019, at 9:03 AM, Guilherme Viteri <gvit...@ebi.ac.uk > > >> <mailto:gvit...@ebi.ac.uk>> wrote: > > >>>>>> > > >>>>>> Hi Wunder, > > >>>>>> > > >>>>>> My indexer takes quite a few hours to be executed I am shortening > it > > >> to run faster, but I also need to make sure it gives what we are > > expecting. > > >> This implementation's been there for >4y, and massively used. > > >>>>>> > > >>>>>>> In your edismax handlers, weights of 20, 50, and 100 are > extremely > > >> high. I don’t think I’ve ever used a weight higher than 16 in a dozen > > years > > >> of configuring Solr. > > >>>>>> I've inherited that implementation and I am really keen to > adequate > > >> it, what would you recommend ? > > >>>>>> > > >>>>>> Cheers > > >>>>>> Guilherme > > >>>>>> > > >>>>>>> On 7 Nov 2019, at 14:43, Walter Underwood <wun...@wunderwood.org > > >> <mailto:wun...@wunderwood.org>> wrote: > > >>>>>>> > > >>>>>>> Thanks for posting the files. Looking at schema.xml, I see that > you > > >> still are using StopFilterFactory. The first advice we gave you was to > > >> remove that. > > >>>>>>> > > >>>>>>> Remove StopFilterFactory everywhere and reindex. > > >>>>>>> > > >>>>>>> You will continue to have problems matching stopwords until you > do > > >> that. > > >>>>>>> > > >>>>>>> In your edismax handlers, weights of 20, 50, and 100 are > extremely > > >> high. I don’t think I’ve ever used a weight higher than 16 in a dozen > > years > > >> of configuring Solr. > > >>>>>>> > > >>>>>>> wunder > > >>>>>>> Walter Underwood > > >>>>>>> wun...@wunderwood.org <mailto:wun...@wunderwood.org> > > >>>>>>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/ > > > > >> (my blog) > > >>>>>>> > > >>>>>>>> On Nov 7, 2019, at 6:56 AM, Guilherme Viteri <gvit...@ebi.ac.uk > > >> <mailto:gvit...@ebi.ac.uk>> wrote: > > >>>>>>>> > > >>>>>>>> Hi Paras, everyone > > >>>>>>>> > > >>>>>>>> Thank you again for your inputs and suggestions. I sorry to hear > > >> you had trouble with the attachments I will host it somewhere and > share > > the > > >> links. > > >>>>>>>> I don't tweak my index, I get the data from the graph database, > > >> create a document as they are and save to solr. > > >>>>>>>> > > >>>>>>>> So, I am sending the new analysis screen querying the way you > > >> suggested. Also the results with params and solr query url. > > >>>>>>>> > > >>>>>>>> During the process of querying what you asked I found something > > >> really weird (at least for me). By accident, I ended up querying the > > using > > >> the default handler (/select) and it worked. Then If I use the one I > > must > > >> use, then sadly doesn't work. I am posting both results and I will > also > > >> post the handlers as well. > > >>>>>>>> > > >>>>>>>> Here is the link with all the files mentioned before > > >>>>>>>> > > >> > > > https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0< > > >> > > > https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0> > > >> < > > > https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0 > > >> < > > > https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0 > > >>>> > > >>>>>>>> If the link doesn't work www dot dropbox dot com slash sh slash > > >> fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a ? dl equals 0 > > >>>>>>>> > > >>>>>>>> Thanks > > >>>>>>>> > > >>>>>>>>> On 7 Nov 2019, at 05:23, Paras Lehana < > > paras.leh...@indiamart.com > > >> <mailto:paras.leh...@indiamart.com>> wrote: > > >>>>>>>>> > > >>>>>>>>> Hi Guilherme. > > >>>>>>>>> > > >>>>>>>>> I am sending they analysis result and the json result as > > requested. > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> Thanks for the effort. Luckily, I can see your attachments (low > > >> quality > > >>>>>>>>> though). > > >>>>>>>>> > > >>>>>>>>> From the analysis screen, the analysis is working as expected. > > One > > >> of the > > >>>>>>>>> reasons for query="lymphoid and *a* non-lymphoid cell" not > > matching > > >>>>>>>>> document containing "Lymphoid and a non-Lymphoid cell" I can > > >> initially > > >>>>>>>>> think of is: the stopword "a" is probably present in > > post-analysis > > >> either > > >>>>>>>>> of query or index. Did you tweak your index time analysis after > > >> indexing? > > >>>>>>>>> > > >>>>>>>>> Do two things: > > >>>>>>>>> > > >>>>>>>>> 1. Post the analysis screen for and index=*"Immunoregulatory > > >>>>>>>>> interactions between a Lymphoid and a non-Lymphoid cell"* and > > >>>>>>>>> "query=*"lymphoid > > >>>>>>>>> and a non-lymphoid cell"*. Try hosting the image and providing > > the > > >> link > > >>>>>>>>> here. > > >>>>>>>>> 2. Give the same JSON output as you have sent but this time > with > > >>>>>>>>> *"echoParams=all"*. Also, post the exact Solr query url. > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> On Wed, 6 Nov 2019 at 21:07, Erick Erickson < > > >> erickerick...@gmail.com <mailto:erickerick...@gmail.com>> wrote: > > >>>>>>>>> > > >>>>>>>>>> I don’t see the attachments, maybe I deleted old e-mails or > some > > >> such. The > > >>>>>>>>>> Apache server is fairly aggressive about stripping attachments > > >> though, so > > >>>>>>>>>> it’s also possible they didn’t make it through. > > >>>>>>>>>> > > >>>>>>>>>>> On Nov 6, 2019, at 9:28 AM, Guilherme Viteri < > > gvit...@ebi.ac.uk > > >> <mailto:gvit...@ebi.ac.uk>> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>> Thanks Erick. > > >>>>>>>>>>> > > >>>>>>>>>>>> First, your index and analysis chains are considerably > > >> different, this > > >>>>>>>>>> can easily be a source of problems. In particular, using two > > >> different > > >>>>>>>>>> tokenizers is a huge red flag. I _strongly_ recommend against > > >> this unless > > >>>>>>>>>> you’re totally sure you understand the consequences. > > >> Additionally, your use > > >>>>>>>>>> of the length filter is suspicious, especially since your > > problem > > >> statement > > >>>>>>>>>> is about the addition of a single letter term and the min > length > > >> allowed on > > >>>>>>>>>> that filter is 2. That said, it’s reasonable to suppose that > the > > >> ’a’ is > > >>>>>>>>>> filtered out in both cases, but maybe you’ve found something > odd > > >> about the > > >>>>>>>>>> interactions. > > >>>>>>>>>>> I will investigate the min length and post the results later. > > >>>>>>>>>>> > > >>>>>>>>>>>> Second, I have no idea what this will do. Are the equal > signs > > >> typos? > > >>>>>>>>>> Used by custom code? > > >>>>>>>>>>> This the url in my application, not solr params. That's the > > >> query string. > > >>>>>>>>>>> > > >>>>>>>>>>>> What does “species=“ do? That’s not Solr syntax, so it’s > > likely > > >> that > > >>>>>>>>>> all the params with an equal-sign are totally ignored unless > > it’s > > >> just a > > >>>>>>>>>> typo. > > >>>>>>>>>>> This is part of the application. Species will be used later > on > > >> in solr > > >>>>>>>>>> to filter out the result. That's not solr. That my app params. > > >>>>>>>>>>> > > >>>>>>>>>>>> Third, the easiest way to see what’s happening under the > > covers > > >> is to > > >>>>>>>>>> add “&debug=true” to the query and look at the parsed query. > > >> Ignore all the > > >>>>>>>>>> relevance calculations for the nonce, or specify > “&debug=query” > > >> to skip > > >>>>>>>>>> that part. > > >>>>>>>>>>> The two json files i've sent, they are debugQuery=on and the > > >> explain tag > > >>>>>>>>>> is present. > > >>>>>>>>>>> I will try the searching the way you mentioned. > > >>>>>>>>>>> > > >>>>>>>>>>> Thank for your inputs > > >>>>>>>>>>> > > >>>>>>>>>>> Guilherme > > >>>>>>>>>>> > > >>>>>>>>>>>> On 6 Nov 2019, at 14:14, Erick Erickson < > > >> erickerick...@gmail.com <mailto:erickerick...@gmail.com>> > > >>>>>>>>>> wrote: > > >>>>>>>>>>>> > > >>>>>>>>>>>> Fwd to another server > > >>>>>>>>>>>> > > >>>>>>>>>>>> First, your index and analysis chains are considerably > > >> different, this > > >>>>>>>>>> can easily be a source of problems. In particular, using two > > >> different > > >>>>>>>>>> tokenizers is a huge red flag. I _strongly_ recommend against > > >> this unless > > >>>>>>>>>> you’re totally sure you understand the consequences. > > >> Additionally, your use > > >>>>>>>>>> of the length filter is suspicious, especially since your > > problem > > >> statement > > >>>>>>>>>> is about the addition of a single letter term and the min > length > > >> allowed on > > >>>>>>>>>> that filter is 2. That said, it’s reasonable to suppose that > the > > >> ’a’ is > > >>>>>>>>>> filtered out in both cases, but maybe you’ve found something > odd > > >> about the > > >>>>>>>>>> interactions. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Second, I have no idea what this will do. Are the equal > signs > > >> typos? > > >>>>>>>>>> Used by custom code? > > >>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>> > > >> > > > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > > >> < > > >> > > > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > > >>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> What does “species=“ do? That’s not Solr syntax, so it’s > > likely > > >> that > > >>>>>>>>>> all the params with an equal-sign are totally ignored unless > > it’s > > >> just a > > >>>>>>>>>> typo. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Third, the easiest way to see what’s happening under the > > covers > > >> is to > > >>>>>>>>>> add “&debug=true” to the query and look at the parsed query. > > >> Ignore all the > > >>>>>>>>>> relevance calculations for the nonce, or specify > “&debug=query” > > >> to skip > > >>>>>>>>>> that part. > > >>>>>>>>>>>> > > >>>>>>>>>>>> 90% + of the time, the question “why didn’t this query do > > what I > > >>>>>>>>>> expect” is answered by looking at the “&debug=query” output > and > > >> the > > >>>>>>>>>> analysis page in the admin UI. NOTE: for the analysis page be > > >> sure to look > > >>>>>>>>>> at _both_ the query and index output. Also, and very important > > >> about the > > >>>>>>>>>> analysis page (and this is confusing) is that this _assumes_ > > that > > >> what you > > >>>>>>>>>> put in the text boxes have made it through the query parser > > >> intact and is > > >>>>>>>>>> analyzed by the field selected. Consider the search > > >> "q=field:word1 word2". > > >>>>>>>>>> Now you type “word1 word2” into the analysis text box and it > > >> looks like > > >>>>>>>>>> what you expect. That’s misleading because the query is > _parsed_ > > >> as > > >>>>>>>>>> "field:word1 default_search_field:word2”. This is where > > >> “&debug=query” > > >>>>>>>>>> helps. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Best, > > >>>>>>>>>>>> Erick > > >>>>>>>>>>>> > > >>>>>>>>>>>>> On Nov 6, 2019, at 2:36 AM, Paras Lehana < > > >> paras.leh...@indiamart.com <mailto:paras.leh...@indiamart.com>> > > >>>>>>>>>> wrote: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Hi Walter, > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> The solr.StopFilter removes all tokens that are stopwords. > > >> Those words > > >>>>>>>>>> will > > >>>>>>>>>>>>>> not be in the index, so they can never match a query. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> I think the OP's concern is different results when adding a > > >> stopword. I > > >>>>>>>>>>>>> think he's using the filter factory correctly - the query > > chain > > >>>>>>>>>> includes > > >>>>>>>>>>>>> the filter as well so it should remove "a" while querying. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> *@Guilherme*, please post results for both the query, the > > >> document in > > >>>>>>>>>>>>> result you are concerned about and post full result of > > >> analysis screen > > >>>>>>>>>> (for > > >>>>>>>>>>>>> both query and index). > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> On Tue, 5 Nov 2019 at 21:38, Walter Underwood < > > >> wun...@wunderwood.org <mailto:wun...@wunderwood.org>> > > >>>>>>>>>> wrote: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>>> No. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> The solr.StopFilter removes all tokens that are stopwords. > > >> Those words > > >>>>>>>>>>>>>> will not be in the index, so they can never match a query. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> 1. Remove the lines with solr.StopFilter from every > analysis > > >> chain in > > >>>>>>>>>>>>>> schema.xml. > > >>>>>>>>>>>>>> 2. Reload the collection, restart Solr, or whatever to > read > > >> the new > > >>>>>>>>>> config. > > >>>>>>>>>>>>>> 3. Reindex all of the documents. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> When indexed with the new analysis chain, the stopwords > will > > >> not be > > >>>>>>>>>>>>>> removed and they will be searchable. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> wunder > > >>>>>>>>>>>>>> Walter Underwood > > >>>>>>>>>>>>>> wun...@wunderwood.org <mailto:wun...@wunderwood.org> > > >>>>>>>>>>>>>> http://observer.wunderwood.org/ < > > >> http://observer.wunderwood.org/> (my blog) > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> On Nov 5, 2019, at 8:56 AM, Guilherme Viteri < > > >> gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>> > > >>>>>>>>>> wrote: > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Ok. I am kind a lost now. > > >>>>>>>>>>>>>>> If I open up the console > analysis and perform it, > that's > > >> the final > > >>>>>>>>>>>>>> result. > > >>>>>>>>>>>>>>> <Screenshot 2019-11-05 at 14.54.16.png> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Your suggestion is: get rid of the <filter stopword.txt> > in > > >> the > > >>>>>>>>>>>>>> schema.xml and during index phase replaceAll("in > > >> stopwords.txt"," ") > > >>>>>>>>>> then > > >>>>>>>>>>>>>> add to solr. Is that correct ? > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Thanks David > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> On 5 Nov 2019, at 14:48, David Hastings < > > >>>>>>>>>> hastings.recurs...@gmail.com <mailto: > > hastings.recurs...@gmail.com > > >>> > > >>>>>>>>>>>>>> <mailto:hastings.recurs...@gmail.com <mailto: > > >> hastings.recurs...@gmail.com>>> wrote: > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Fwd to another server > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> no, > > >>>>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" > > >> ignoreCase="true" > > >>>>>>>>>>>>>>>> words="stopwords.txt"/> > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> is still using stopwords and should be removed, in my > > >> opinion of > > >>>>>>>>>> course, > > >>>>>>>>>>>>>>>> based on your use case may be different, but i generally > > >> axe any > > >>>>>>>>>>>>>> reference > > >>>>>>>>>>>>>>>> to them at all > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> On Tue, Nov 5, 2019 at 9:47 AM Guilherme Viteri < > > >> gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk> > > >>>>>>>>>>>>>> <mailto:gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>>> > > wrote: > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Thanks. > > >>>>>>>>>>>>>>>>> Haven't I done this here ? > > >>>>>>>>>>>>>>>>> <fieldType name="text_field" class="solr.TextField" > > >>>>>>>>>>>>>>>>> positionIncrementGap="100" omitNorms="false" > > > >>>>>>>>>>>>>>>>> <analyzer type="index"> > > >>>>>>>>>>>>>>>>> <tokenizer class="solr.StandardTokenizerFactory"/> > > >>>>>>>>>>>>>>>>> <filter class="solr.ClassicFilterFactory"/> > > >>>>>>>>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" > > >>>>>>>>>>>>>> max="20"/> > > >>>>>>>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> > > >>>>>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" > > >> ignoreCase="true" > > >>>>>>>>>>>>>>>>> words="stopwords.txt"/> > > >>>>>>>>>>>>>>>>> </analyzer> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> On 5 Nov 2019, at 14:15, David Hastings < > > >>>>>>>>>> hastings.recurs...@gmail.com <mailto: > > hastings.recurs...@gmail.com > > >>> > > >>>>>>>>>>>>>> <mailto:hastings.recurs...@gmail.com <mailto: > > >> hastings.recurs...@gmail.com>>> > > >>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Fwd to another server > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> The first thing you should do is remove any reference > to > > >> stop > > >>>>>>>>>> words > > >>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>>> never use them, then re-index your data and try it > > again. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> On Tue, Nov 5, 2019 at 9:14 AM Guilherme Viteri < > > >>>>>>>>>> gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk> > > >>>>>>>>>>>>>> <mailto:gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>>> > > >>>>>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> Hi, > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> I am performing a search to match a name > (text_field), > > >> however > > >>>>>>>>>> this > > >>>>>>>>>>>>>> term > > >>>>>>>>>>>>>>>>>>> contains 'and' and 'a' and it doesn't return any > > >> records. If i > > >>>>>>>>>> remove > > >>>>>>>>>>>>>>>>> 'a' > > >>>>>>>>>>>>>>>>>>> then it works. > > >>>>>>>>>>>>>>>>>>> e.g > > >>>>>>>>>>>>>>>>>>> Search Term: lymphoid and a non-lymphoid cell > > >>>>>>>>>>>>>>>>>>> doesn't work: > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>> > > >> > > > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > > >> < > > >> > > > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > > >>> > > >>>>>>>>>>>>>> < > > >>>>>>>>>>>>>> > > >>>>>>>>>> > > >> > > > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > > >> < > > >> > > > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > > >>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> < > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>> > > >> > > > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > > >> < > > >> > > > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > > >>> > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> Search term: lymphoid and non-lymphoid cell > > >>>>>>>>>>>>>>>>>>> works: > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>> > > >> > > > https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > > >> < > > >> > > > https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > > >>> > > >>>>>>>>>>>>>>>>>>> < > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>> > > >> > > > https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > > >> < > > >> > > > https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > > >>> > > >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> interested in the first result > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> schema.xml > > >>>>>>>>>>>>>>>>>>> <field name="name" > > >> type="text_field" > > >>>>>>>>>>>>>>>>>>> indexed="true" stored="true" omitNorms="false" > > >>>>>>>>>> required="true" > > >>>>>>>>>>>>>>>>>>> multiValued="false"/> > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> <analyzer type="query"> > > >>>>>>>>>>>>>>>>>>> <tokenizer class="solr.PatternTokenizerFactory" > > >>>>>>>>>>>>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/> > > >>>>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" > > >>>>>>>>>>>>>>>>>>> pattern="^[/._:]+" replacement=""/> > > >>>>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" > > >>>>>>>>>>>>>>>>>>> pattern="[/._:]+$" replacement=""/> > > >>>>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" > > >>>>>>>>>>>>>>>>>>> pattern="[_]" replacement=" "/> > > >>>>>>>>>>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" > > >>>>>>>>>>>>>>>>> max="20"/> > > >>>>>>>>>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> > > >>>>>>>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" > > >>>>>>>>>>>>>> ignoreCase="true" > > >>>>>>>>>>>>>>>>>>> words="stopwords.txt"/> > > >>>>>>>>>>>>>>>>>>> </analyzer> > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> <fieldType name="text_field" class="solr.TextField" > > >>>>>>>>>>>>>>>>>>> positionIncrementGap="100" omitNorms="false" > > > >>>>>>>>>>>>>>>>>>> <analyzer type="index"> > > >>>>>>>>>>>>>>>>>>> <tokenizer > class="solr.StandardTokenizerFactory"/> > > >>>>>>>>>>>>>>>>>>> <filter class="solr.ClassicFilterFactory"/> > > >>>>>>>>>>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" > > >>>>>>>>>>>>>>>>> max="20"/> > > >>>>>>>>>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> > > >>>>>>>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" > > >>>>>>>>>>>>>> ignoreCase="true" > > >>>>>>>>>>>>>>>>>>> words="stopwords.txt"/> > > >>>>>>>>>>>>>>>>>>> </analyzer> > > >>>>>>>>>>>>>>>>>>> <analyzer type="query"> > > >>>>>>>>>>>>>>>>>>> <tokenizer class="solr.PatternTokenizerFactory" > > >>>>>>>>>>>>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/> > > >>>>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" > > >>>>>>>>>>>>>>>>>>> pattern="^[/._:]+" replacement=""/> > > >>>>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" > > >>>>>>>>>>>>>>>>>>> pattern="[/._:]+$" replacement=""/> > > >>>>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" > > >>>>>>>>>>>>>>>>>>> pattern="[_]" replacement=" "/> > > >>>>>>>>>>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" > > >>>>>>>>>>>>>>>>> max="20"/> > > >>>>>>>>>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> > > >>>>>>>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" > > >>>>>>>>>>>>>> ignoreCase="true" > > >>>>>>>>>>>>>>>>>>> words="stopwords.txt"/> > > >>>>>>>>>>>>>>>>>>> </analyzer> > > >>>>>>>>>>>>>>>>>>> </fieldType> > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> stopwords.txt > > >>>>>>>>>>>>>>>>>>> #Standard english stop words taken from Lucene's > > >> StopAnalyzer > > >>>>>>>>>>>>>>>>>>> a > > >>>>>>>>>>>>>>>>>>> b > > >>>>>>>>>>>>>>>>>>> c > > >>>>>>>>>>>>>>>>>>> .... > > >>>>>>>>>>>>>>>>>>> an > > >>>>>>>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>>>> are > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> Running SolR 6.6.2. > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> Is there anything I could do to prevent this ? > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> Thanks > > >>>>>>>>>>>>>>>>>>> Guilherme > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> -- > > >>>>>>>>>>>>> -- > > >>>>>>>>>>>>> Regards, > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> *Paras Lehana* [65871] > > >>>>>>>>>>>>> Development Engineer, Auto-Suggest, > > >>>>>>>>>>>>> IndiaMART Intermesh Ltd. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, > > >>>>>>>>>>>>> Noida, UP, IN - 201303 > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Mob.: +91-9560911996 > > >>>>>>>>>>>>> Work: 01203916600 | Extn: *8173* > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> -- > > >>>>>>>>>>>>> IMPORTANT: > > >>>>>>>>>>>>> NEVER share your IndiaMART OTP/ Password with anyone. > > >>>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> -- > > >>>>>>>>> -- > > >>>>>>>>> Regards, > > >>>>>>>>> > > >>>>>>>>> *Paras Lehana* [65871] > > >>>>>>>>> Development Engineer, Auto-Suggest, > > >>>>>>>>> IndiaMART Intermesh Ltd. > > >>>>>>>>> > > >>>>>>>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, > > >>>>>>>>> Noida, UP, IN - 201303 > > >>>>>>>>> > > >>>>>>>>> Mob.: +91-9560911996 > > >>>>>>>>> Work: 01203916600 | Extn: *8173* > > >>>>>>>>> > > >>>>>>>>> -- > > >>>>>>>>> IMPORTANT: > > >>>>>>>>> NEVER share your IndiaMART OTP/ Password with anyone. > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> -- > > >>>>> -- > > >>>>> Regards, > > >>>>> > > >>>>> Paras Lehana [65871] > > >>>>> Development Engineer, Auto-Suggest, > > >>>>> IndiaMART Intermesh Ltd. > > >>>>> > > >>>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, > > >>>>> Noida, UP, IN - 201303 > > >>>>> > > >>>>> Mob.: +91-9560911996 <tel:+91-9560911996> > > >>>>> Work: 01203916600 | Extn: 8173 > > >>>>> > > >>>>> IMPORTANT: > > >>>>> NEVER share your IndiaMART OTP/ Password with anyone. > > >>> > > >> > > >> > > > > > -- -- Regards, *Paras Lehana* [65871] Development Engineer, Auto-Suggest, IndiaMART Intermesh Ltd. 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, Noida, UP, IN - 201303 Mob.: +91-9560911996 Work: 01203916600 | Extn: *8173* -- IMPORTANT: NEVER share your IndiaMART OTP/ Password with anyone.