Thanks > Removing stopwords is another story. I'm curious to find the reason > assuming that you keep on using stopwords. In some cases, stopwords are > really necessary. Yes. It always make sense the way we've been using.
> If q.alt is giving you responses, it's confirmed that your stopwords filter > is working as expected. The problem definitely lies in the configuration of > edismax. I see. > *Let me explain again:* In your solrconfig.xml, look at your /search Ok, using q now, removed all qf, performed the search and I got 23 results, and the one I really want, on the top. As soon as I add dbId or stId (regardless the boost, 1.0 or 100.0), then I don't get anything (which make sense). However if I query name_exact, I get the 23 results again, and unfortunately if I query stId^1.0 name_exact^10.0 I still don't get any results. In summary - without qf - 23 results - dbId - 0 results - name_exact - 16 results - name - 23 results - dbId^1.0 name_exact^10.0 - 0 results - 0 results if any other, stId, dbId (key) is added on top of the name(name_exact, etc). Definitely lost here! :-/ > On 11 Nov 2019, at 07:59, Paras Lehana <paras.leh...@indiamart.com> wrote: > > Hi > > So I don't think removing it completely is the way to go from the scenario >> we have > > > Removing stopwords is another story. I'm curious to find the reason > assuming that you keep on using stopwords. In some cases, stopwords are > really necessary. > > > Quite a considerable increase > > > If q.alt is giving you responses, it's confirmed that your stopwords filter > is working as expected. The problem definitely lies in the configuration of > edismax. > > > >> I am sorry but I didn't understand what do you want me to do exactly with >> the lst (??) and qf and bf. > > > What combinations did you try? I was referring to the field-level boosting > you have applied in edismax config. > > *Let me explain again:* In your solrconfig.xml, look at your /search > request handler. There are many qf and some bq boosts. I want you to remove > all of these, check response again (with q now) and keep on adding them > again (one by one) while looking for when the numFound drastically changes. > > On Fri, 8 Nov 2019 at 23:47, David Hastings <hastings.recurs...@gmail.com> > wrote: > >> I use 3 word shingles with stopwords for my MLT ML trainer that worked >> pretty well for such a solution, but for a full index the size became >> prohibitive >> >> On Fri, Nov 8, 2019 at 12:13 PM Walter Underwood <wun...@wunderwood.org> >> wrote: >> >>> If we had IDF for phrases, they would be super effective. The 2X weight >> is >>> a hack that mostly works. >>> >>> Infoseek had phrase IDF and it was a killer algorithm for relevance. >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org >>> http://observer.wunderwood.org/ (my blog) >>> >>>> On Nov 8, 2019, at 11:08 AM, David Hastings < >>> hastings.recurs...@gmail.com> wrote: >>>> >>>> the pf and qf fields are REALLY nice for this >>>> >>>> On Fri, Nov 8, 2019 at 12:02 PM Walter Underwood < >> wun...@wunderwood.org> >>>> wrote: >>>> >>>>> I always enable phrase searching in edismax for exactly this reason. >>>>> >>>>> Something like: >>>>> >>>>> <str name="qf”>title^8 keywords^4 text</str> >>>>> <str name="pf”>title^16 keywords^8 text^2</str> >>>>> >>>>> To deal with concepts in queries, a classifier and/or named entity >>>>> extractor can be helpful. If you have a list of concepts (“controlled >>>>> vocabulary”) that includes “Lamin A”, and that shows up in a query, >> that >>>>> term can be queried against the field matching that vocabulary. >>>>> >>>>> This is how LinkedIn separates people, companies, and places, for >>> example. >>>>> >>>>> wunder >>>>> Walter Underwood >>>>> wun...@wunderwood.org >>>>> http://observer.wunderwood.org/ (my blog) >>>>> >>>>>> On Nov 8, 2019, at 10:48 AM, Erick Erickson <erickerick...@gmail.com >>> >>>>> wrote: >>>>>> >>>>>> Look at the “mm” parameter, try setting it to 100%. Although that’t >> not >>>>> entirely likely to do what you want either since virtually every doc >>> will >>>>> have “a” in it. But at least you’d get docs that have both terms. >>>>>> >>>>>> you may also be able to search for things like “Lamin A” _only as a >>>>> phrase_ and have some luck. But this is a gnarly problem in general. >>> Some >>>>> people have been able to substitute synonyms and/or shingles to make >>> this >>>>> work at the expense of a larger index. >>>>>> >>>>>> This is a generic problem with context. “Lamin A” is really a >>> “concept”, >>>>> not just two words that happen to be near each other. Searching as a >>> phrase >>>>> is an OOB-but-naive way to try to make it more likely that the ranked >>>>> results refer to the _concept_ of “Lamin A”. The assumption here is >> “if >>>>> these two words appear next to each other, they’re more likely to be >>> what I >>>>> want”. I say “naive” because “Lamins: A new approach to...” would >>> _also_ be >>>>> found for a naive phrase search. (I have no idea whether such a title >>> makes >>>>> sense or not, but you figured that out already)... >>>>>> >>>>>> To do this well you’d have to dive in to NLP/Machine learning. >>>>>> >>>>>> I truly wish we could have the DWIM search algorithm (Do What I >> Mean)…. >>>>>> >>>>>>> On Nov 8, 2019, at 11:29 AM, Guilherme Viteri <gvit...@ebi.ac.uk> >>>>> wrote: >>>>>>> >>>>>>> HI Walter and Paras >>>>>>> >>>>>>> I indexed it removing all the references to StopWordFilter and I >> went >>>>> from 121 results to near 20K as the search term q="Lymphoid and a >>>>> non-Lymphoid cell" is matching entities such as "IFT A" or "Lamin A". >>> So I >>>>> don't think removing it completely is the way to go from the scenario >> we >>>>> have, but I appreciate the suggestion… >>>>>>> >>>>>>> Yes the response is using fl=* >>>>>>> I am trying some combinations at the moment, but yet no success. >>>>>>> >>>>>>> defType=edismax >>>>>>> q.alt=Lymphoid and a non-Lymphoid cell >>>>>>> Number of results=1599 >>>>>>> Quite a considerable increase, even though reasonable meaningful >>>>> results. >>>>>>> >>>>>>> I am sorry but I didn't understand what do you want me to do exactly >>>>> with the lst (??) and qf and bf. >>>>>>> >>>>>>> Thanks everyone with their inputs >>>>>>> >>>>>>> >>>>>>>> On 8 Nov 2019, at 06:45, Paras Lehana <paras.leh...@indiamart.com> >>>>> wrote: >>>>>>>> >>>>>>>> Hi Guilherme >>>>>>>> >>>>>>>> By accident, I ended up querying the using the default handler >>>>> (/select) and it worked. >>>>>>>> >>>>>>>> You've just found the culprit. Thanks for giving the material I >>>>> requested. Your analysis chain is working as expected. I don't see any >>>>> issue in either StopWordFilter or your boosts. I also use a boost of >> 50 >>>>> when boosting contextual suggestions (boosting "gold iphone" on a page >>> of >>>>> iphone) but I take Walter's suggestion and would try to optimize my >>>>> weights. I agree that this 50 thing was not researched much about by >> us >>> as >>>>> well (we never faced performance or relevance issues). >>>>>>>> >>>>>>>> See the major difference in both the handlers - edismax. I'm pretty >>>>> sure that your problem lies in the parsing of queries (you can confirm >>> that >>>>> from parsedquery key in debug of both JSON responses). I hope you have >>>>> provided the response with fl=*. Replace q with q.alt in your /search >>>>> handler query and I think you should start getting responses. That's >>>>> because q.alt uses standard parser. If you want to keep using >> edisMax, I >>>>> suggest you to test the responses removing some combination of lst >> (qf, >>> bf) >>>>> and find what's restricting the documents to come up. I'm out of >> office >>>>> today - would have certainly tried analyzing the field values of the >>>>> document in /select request and compare it with qf/bq in >> solrconfig.xml >>>>> /search. Do this for me and you'd certainly find something. >>>>>>>> >>>>>>>> On Thu, 7 Nov 2019 at 21:00, Walter Underwood < >> wun...@wunderwood.org >>>>> <mailto:wun...@wunderwood.org>> wrote: >>>>>>>> I normally use a weight of 8 for the most important field, like >>> title. >>>>> Other fields might get a 4 or 2. >>>>>>>> >>>>>>>> I add a “pf” field with the weights doubled, so that phrase matches >>>>> have a higher weight. >>>>>>>> >>>>>>>> The weight of 8 comes from experience at Infoseek and Inktomi, two >>>>> early web search engines. With different relevance algorithms and >>> totally >>>>> different evaluation and tuning systems, they settled on weights of 8 >>> and >>>>> 7.5 for HTML titles. With the the two radically different system >> getting >>>>> the same number, I decided that was a property of the documents, not >> of >>> the >>>>> search engines. >>>>>>>> >>>>>>>> wunder >>>>>>>> Walter Underwood >>>>>>>> wun...@wunderwood.org <mailto:wun...@wunderwood.org> >>>>>>>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/> >>>>> (my blog) >>>>>>>> >>>>>>>>> On Nov 7, 2019, at 9:03 AM, Guilherme Viteri <gvit...@ebi.ac.uk >>>>> <mailto:gvit...@ebi.ac.uk>> wrote: >>>>>>>>> >>>>>>>>> Hi Wunder, >>>>>>>>> >>>>>>>>> My indexer takes quite a few hours to be executed I am shortening >> it >>>>> to run faster, but I also need to make sure it gives what we are >>> expecting. >>>>> This implementation's been there for >4y, and massively used. >>>>>>>>> >>>>>>>>>> In your edismax handlers, weights of 20, 50, and 100 are >> extremely >>>>> high. I don’t think I’ve ever used a weight higher than 16 in a dozen >>> years >>>>> of configuring Solr. >>>>>>>>> I've inherited that implementation and I am really keen to >> adequate >>>>> it, what would you recommend ? >>>>>>>>> >>>>>>>>> Cheers >>>>>>>>> Guilherme >>>>>>>>> >>>>>>>>>> On 7 Nov 2019, at 14:43, Walter Underwood <wun...@wunderwood.org >>>>> <mailto:wun...@wunderwood.org>> wrote: >>>>>>>>>> >>>>>>>>>> Thanks for posting the files. Looking at schema.xml, I see that >> you >>>>> still are using StopFilterFactory. The first advice we gave you was to >>>>> remove that. >>>>>>>>>> >>>>>>>>>> Remove StopFilterFactory everywhere and reindex. >>>>>>>>>> >>>>>>>>>> You will continue to have problems matching stopwords until you >> do >>>>> that. >>>>>>>>>> >>>>>>>>>> In your edismax handlers, weights of 20, 50, and 100 are >> extremely >>>>> high. I don’t think I’ve ever used a weight higher than 16 in a dozen >>> years >>>>> of configuring Solr. >>>>>>>>>> >>>>>>>>>> wunder >>>>>>>>>> Walter Underwood >>>>>>>>>> wun...@wunderwood.org <mailto:wun...@wunderwood.org> >>>>>>>>>> http://observer.wunderwood.org/ <http://observer.wunderwood.org/ >>> >>>>> (my blog) >>>>>>>>>> >>>>>>>>>>> On Nov 7, 2019, at 6:56 AM, Guilherme Viteri <gvit...@ebi.ac.uk >>>>> <mailto:gvit...@ebi.ac.uk>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Paras, everyone >>>>>>>>>>> >>>>>>>>>>> Thank you again for your inputs and suggestions. I sorry to hear >>>>> you had trouble with the attachments I will host it somewhere and >> share >>> the >>>>> links. >>>>>>>>>>> I don't tweak my index, I get the data from the graph database, >>>>> create a document as they are and save to solr. >>>>>>>>>>> >>>>>>>>>>> So, I am sending the new analysis screen querying the way you >>>>> suggested. Also the results with params and solr query url. >>>>>>>>>>> >>>>>>>>>>> During the process of querying what you asked I found something >>>>> really weird (at least for me). By accident, I ended up querying the >>> using >>>>> the default handler (/select) and it worked. Then If I use the one I >>> must >>>>> use, then sadly doesn't work. I am posting both results and I will >> also >>>>> post the handlers as well. >>>>>>>>>>> >>>>>>>>>>> Here is the link with all the files mentioned before >>>>>>>>>>> >>>>> >>> >> https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0< >>>>> >>> >> https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0> >>>>> < >>> >> https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0 >>>>> < >>> >> https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0 >>>>>>> >>>>>>>>>>> If the link doesn't work www dot dropbox dot com slash sh slash >>>>> fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a ? dl equals 0 >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>>> On 7 Nov 2019, at 05:23, Paras Lehana < >>> paras.leh...@indiamart.com >>>>> <mailto:paras.leh...@indiamart.com>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi Guilherme. >>>>>>>>>>>> >>>>>>>>>>>> I am sending they analysis result and the json result as >>> requested. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks for the effort. Luckily, I can see your attachments (low >>>>> quality >>>>>>>>>>>> though). >>>>>>>>>>>> >>>>>>>>>>>> From the analysis screen, the analysis is working as expected. >>> One >>>>> of the >>>>>>>>>>>> reasons for query="lymphoid and *a* non-lymphoid cell" not >>> matching >>>>>>>>>>>> document containing "Lymphoid and a non-Lymphoid cell" I can >>>>> initially >>>>>>>>>>>> think of is: the stopword "a" is probably present in >>> post-analysis >>>>> either >>>>>>>>>>>> of query or index. Did you tweak your index time analysis after >>>>> indexing? >>>>>>>>>>>> >>>>>>>>>>>> Do two things: >>>>>>>>>>>> >>>>>>>>>>>> 1. Post the analysis screen for and index=*"Immunoregulatory >>>>>>>>>>>> interactions between a Lymphoid and a non-Lymphoid cell"* and >>>>>>>>>>>> "query=*"lymphoid >>>>>>>>>>>> and a non-lymphoid cell"*. Try hosting the image and providing >>> the >>>>> link >>>>>>>>>>>> here. >>>>>>>>>>>> 2. Give the same JSON output as you have sent but this time >> with >>>>>>>>>>>> *"echoParams=all"*. Also, post the exact Solr query url. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, 6 Nov 2019 at 21:07, Erick Erickson < >>>>> erickerick...@gmail.com <mailto:erickerick...@gmail.com>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I don’t see the attachments, maybe I deleted old e-mails or >> some >>>>> such. The >>>>>>>>>>>>> Apache server is fairly aggressive about stripping attachments >>>>> though, so >>>>>>>>>>>>> it’s also possible they didn’t make it through. >>>>>>>>>>>>> >>>>>>>>>>>>>> On Nov 6, 2019, at 9:28 AM, Guilherme Viteri < >>> gvit...@ebi.ac.uk >>>>> <mailto:gvit...@ebi.ac.uk>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks Erick. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> First, your index and analysis chains are considerably >>>>> different, this >>>>>>>>>>>>> can easily be a source of problems. In particular, using two >>>>> different >>>>>>>>>>>>> tokenizers is a huge red flag. I _strongly_ recommend against >>>>> this unless >>>>>>>>>>>>> you’re totally sure you understand the consequences. >>>>> Additionally, your use >>>>>>>>>>>>> of the length filter is suspicious, especially since your >>> problem >>>>> statement >>>>>>>>>>>>> is about the addition of a single letter term and the min >> length >>>>> allowed on >>>>>>>>>>>>> that filter is 2. That said, it’s reasonable to suppose that >> the >>>>> ’a’ is >>>>>>>>>>>>> filtered out in both cases, but maybe you’ve found something >> odd >>>>> about the >>>>>>>>>>>>> interactions. >>>>>>>>>>>>>> I will investigate the min length and post the results later. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Second, I have no idea what this will do. Are the equal >> signs >>>>> typos? >>>>>>>>>>>>> Used by custom code? >>>>>>>>>>>>>> This the url in my application, not solr params. That's the >>>>> query string. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> What does “species=“ do? That’s not Solr syntax, so it’s >>> likely >>>>> that >>>>>>>>>>>>> all the params with an equal-sign are totally ignored unless >>> it’s >>>>> just a >>>>>>>>>>>>> typo. >>>>>>>>>>>>>> This is part of the application. Species will be used later >> on >>>>> in solr >>>>>>>>>>>>> to filter out the result. That's not solr. That my app params. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Third, the easiest way to see what’s happening under the >>> covers >>>>> is to >>>>>>>>>>>>> add “&debug=true” to the query and look at the parsed query. >>>>> Ignore all the >>>>>>>>>>>>> relevance calculations for the nonce, or specify >> “&debug=query” >>>>> to skip >>>>>>>>>>>>> that part. >>>>>>>>>>>>>> The two json files i've sent, they are debugQuery=on and the >>>>> explain tag >>>>>>>>>>>>> is present. >>>>>>>>>>>>>> I will try the searching the way you mentioned. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thank for your inputs >>>>>>>>>>>>>> >>>>>>>>>>>>>> Guilherme >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 6 Nov 2019, at 14:14, Erick Erickson < >>>>> erickerick...@gmail.com <mailto:erickerick...@gmail.com>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Fwd to another server >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> First, your index and analysis chains are considerably >>>>> different, this >>>>>>>>>>>>> can easily be a source of problems. In particular, using two >>>>> different >>>>>>>>>>>>> tokenizers is a huge red flag. I _strongly_ recommend against >>>>> this unless >>>>>>>>>>>>> you’re totally sure you understand the consequences. >>>>> Additionally, your use >>>>>>>>>>>>> of the length filter is suspicious, especially since your >>> problem >>>>> statement >>>>>>>>>>>>> is about the addition of a single letter term and the min >> length >>>>> allowed on >>>>>>>>>>>>> that filter is 2. That said, it’s reasonable to suppose that >> the >>>>> ’a’ is >>>>>>>>>>>>> filtered out in both cases, but maybe you’ve found something >> odd >>>>> about the >>>>>>>>>>>>> interactions. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Second, I have no idea what this will do. Are the equal >> signs >>>>> typos? >>>>>>>>>>>>> Used by custom code? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>> >>> >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>>> < >>>>> >>> >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> What does “species=“ do? That’s not Solr syntax, so it’s >>> likely >>>>> that >>>>>>>>>>>>> all the params with an equal-sign are totally ignored unless >>> it’s >>>>> just a >>>>>>>>>>>>> typo. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Third, the easiest way to see what’s happening under the >>> covers >>>>> is to >>>>>>>>>>>>> add “&debug=true” to the query and look at the parsed query. >>>>> Ignore all the >>>>>>>>>>>>> relevance calculations for the nonce, or specify >> “&debug=query” >>>>> to skip >>>>>>>>>>>>> that part. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 90% + of the time, the question “why didn’t this query do >>> what I >>>>>>>>>>>>> expect” is answered by looking at the “&debug=query” output >> and >>>>> the >>>>>>>>>>>>> analysis page in the admin UI. NOTE: for the analysis page be >>>>> sure to look >>>>>>>>>>>>> at _both_ the query and index output. Also, and very important >>>>> about the >>>>>>>>>>>>> analysis page (and this is confusing) is that this _assumes_ >>> that >>>>> what you >>>>>>>>>>>>> put in the text boxes have made it through the query parser >>>>> intact and is >>>>>>>>>>>>> analyzed by the field selected. Consider the search >>>>> "q=field:word1 word2". >>>>>>>>>>>>> Now you type “word1 word2” into the analysis text box and it >>>>> looks like >>>>>>>>>>>>> what you expect. That’s misleading because the query is >> _parsed_ >>>>> as >>>>>>>>>>>>> "field:word1 default_search_field:word2”. This is where >>>>> “&debug=query” >>>>>>>>>>>>> helps. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>> Erick >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Nov 6, 2019, at 2:36 AM, Paras Lehana < >>>>> paras.leh...@indiamart.com <mailto:paras.leh...@indiamart.com>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Walter, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The solr.StopFilter removes all tokens that are stopwords. >>>>> Those words >>>>>>>>>>>>> will >>>>>>>>>>>>>>>>> not be in the index, so they can never match a query. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I think the OP's concern is different results when adding a >>>>> stopword. I >>>>>>>>>>>>>>>> think he's using the filter factory correctly - the query >>> chain >>>>>>>>>>>>> includes >>>>>>>>>>>>>>>> the filter as well so it should remove "a" while querying. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> *@Guilherme*, please post results for both the query, the >>>>> document in >>>>>>>>>>>>>>>> result you are concerned about and post full result of >>>>> analysis screen >>>>>>>>>>>>> (for >>>>>>>>>>>>>>>> both query and index). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, 5 Nov 2019 at 21:38, Walter Underwood < >>>>> wun...@wunderwood.org <mailto:wun...@wunderwood.org>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> No. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The solr.StopFilter removes all tokens that are stopwords. >>>>> Those words >>>>>>>>>>>>>>>>> will not be in the index, so they can never match a query. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 1. Remove the lines with solr.StopFilter from every >> analysis >>>>> chain in >>>>>>>>>>>>>>>>> schema.xml. >>>>>>>>>>>>>>>>> 2. Reload the collection, restart Solr, or whatever to >> read >>>>> the new >>>>>>>>>>>>> config. >>>>>>>>>>>>>>>>> 3. Reindex all of the documents. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> When indexed with the new analysis chain, the stopwords >> will >>>>> not be >>>>>>>>>>>>>>>>> removed and they will be searchable. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> wunder >>>>>>>>>>>>>>>>> Walter Underwood >>>>>>>>>>>>>>>>> wun...@wunderwood.org <mailto:wun...@wunderwood.org> >>>>>>>>>>>>>>>>> http://observer.wunderwood.org/ < >>>>> http://observer.wunderwood.org/> (my blog) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Nov 5, 2019, at 8:56 AM, Guilherme Viteri < >>>>> gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Ok. I am kind a lost now. >>>>>>>>>>>>>>>>>> If I open up the console > analysis and perform it, >> that's >>>>> the final >>>>>>>>>>>>>>>>> result. >>>>>>>>>>>>>>>>>> <Screenshot 2019-11-05 at 14.54.16.png> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Your suggestion is: get rid of the <filter stopword.txt> >> in >>>>> the >>>>>>>>>>>>>>>>> schema.xml and during index phase replaceAll("in >>>>> stopwords.txt"," ") >>>>>>>>>>>>> then >>>>>>>>>>>>>>>>> add to solr. Is that correct ? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks David >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On 5 Nov 2019, at 14:48, David Hastings < >>>>>>>>>>>>> hastings.recurs...@gmail.com <mailto: >>> hastings.recurs...@gmail.com >>>>>> >>>>>>>>>>>>>>>>> <mailto:hastings.recurs...@gmail.com <mailto: >>>>> hastings.recurs...@gmail.com>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Fwd to another server >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> no, >>>>>>>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" >>>>> ignoreCase="true" >>>>>>>>>>>>>>>>>>> words="stopwords.txt"/> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> is still using stopwords and should be removed, in my >>>>> opinion of >>>>>>>>>>>>> course, >>>>>>>>>>>>>>>>>>> based on your use case may be different, but i generally >>>>> axe any >>>>>>>>>>>>>>>>> reference >>>>>>>>>>>>>>>>>>> to them at all >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Tue, Nov 5, 2019 at 9:47 AM Guilherme Viteri < >>>>> gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk> >>>>>>>>>>>>>>>>> <mailto:gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>>> >>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>>>>> Haven't I done this here ? >>>>>>>>>>>>>>>>>>>> <fieldType name="text_field" class="solr.TextField" >>>>>>>>>>>>>>>>>>>> positionIncrementGap="100" omitNorms="false" > >>>>>>>>>>>>>>>>>>>> <analyzer type="index"> >>>>>>>>>>>>>>>>>>>> <tokenizer class="solr.StandardTokenizerFactory"/> >>>>>>>>>>>>>>>>>>>> <filter class="solr.ClassicFilterFactory"/> >>>>>>>>>>>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" >>>>>>>>>>>>>>>>> max="20"/> >>>>>>>>>>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>>>>>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" >>>>> ignoreCase="true" >>>>>>>>>>>>>>>>>>>> words="stopwords.txt"/> >>>>>>>>>>>>>>>>>>>> </analyzer> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On 5 Nov 2019, at 14:15, David Hastings < >>>>>>>>>>>>> hastings.recurs...@gmail.com <mailto: >>> hastings.recurs...@gmail.com >>>>>> >>>>>>>>>>>>>>>>> <mailto:hastings.recurs...@gmail.com <mailto: >>>>> hastings.recurs...@gmail.com>>> >>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Fwd to another server >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The first thing you should do is remove any reference >> to >>>>> stop >>>>>>>>>>>>> words >>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>> never use them, then re-index your data and try it >>> again. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Tue, Nov 5, 2019 at 9:14 AM Guilherme Viteri < >>>>>>>>>>>>> gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk> >>>>>>>>>>>>>>>>> <mailto:gvit...@ebi.ac.uk <mailto:gvit...@ebi.ac.uk>>> >>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I am performing a search to match a name >> (text_field), >>>>> however >>>>>>>>>>>>> this >>>>>>>>>>>>>>>>> term >>>>>>>>>>>>>>>>>>>>>> contains 'and' and 'a' and it doesn't return any >>>>> records. If i >>>>>>>>>>>>> remove >>>>>>>>>>>>>>>>>>>> 'a' >>>>>>>>>>>>>>>>>>>>>> then it works. >>>>>>>>>>>>>>>>>>>>>> e.g >>>>>>>>>>>>>>>>>>>>>> Search Term: lymphoid and a non-lymphoid cell >>>>>>>>>>>>>>>>>>>>>> doesn't work: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>> >>> >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>>> < >>>>> >>> >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>>>> >>>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>> >>> >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>>> < >>>>> >>> >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>> >>> >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>>> < >>>>> >>> >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Search term: lymphoid and non-lymphoid cell >>>>>>>>>>>>>>>>>>>>>> works: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>> >>> >> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>>> < >>>>> >>> >> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>>>> >>>>>>>>>>>>>>>>>>>>>> < >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>> >>> >> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>>> < >>>>> >>> >> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> interested in the first result >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> schema.xml >>>>>>>>>>>>>>>>>>>>>> <field name="name" >>>>> type="text_field" >>>>>>>>>>>>>>>>>>>>>> indexed="true" stored="true" omitNorms="false" >>>>>>>>>>>>> required="true" >>>>>>>>>>>>>>>>>>>>>> multiValued="false"/> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> <analyzer type="query"> >>>>>>>>>>>>>>>>>>>>>> <tokenizer class="solr.PatternTokenizerFactory" >>>>>>>>>>>>>>>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/> >>>>>>>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>>>>>>>>>>>>>>> pattern="^[/._:]+" replacement=""/> >>>>>>>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>>>>>>>>>>>>>>> pattern="[/._:]+$" replacement=""/> >>>>>>>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>>>>>>>>>>>>>>> pattern="[_]" replacement=" "/> >>>>>>>>>>>>>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" >>>>>>>>>>>>>>>>>>>> max="20"/> >>>>>>>>>>>>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>>>>>>>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" >>>>>>>>>>>>>>>>> ignoreCase="true" >>>>>>>>>>>>>>>>>>>>>> words="stopwords.txt"/> >>>>>>>>>>>>>>>>>>>>>> </analyzer> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> <fieldType name="text_field" class="solr.TextField" >>>>>>>>>>>>>>>>>>>>>> positionIncrementGap="100" omitNorms="false" > >>>>>>>>>>>>>>>>>>>>>> <analyzer type="index"> >>>>>>>>>>>>>>>>>>>>>> <tokenizer >> class="solr.StandardTokenizerFactory"/> >>>>>>>>>>>>>>>>>>>>>> <filter class="solr.ClassicFilterFactory"/> >>>>>>>>>>>>>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" >>>>>>>>>>>>>>>>>>>> max="20"/> >>>>>>>>>>>>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>>>>>>>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" >>>>>>>>>>>>>>>>> ignoreCase="true" >>>>>>>>>>>>>>>>>>>>>> words="stopwords.txt"/> >>>>>>>>>>>>>>>>>>>>>> </analyzer> >>>>>>>>>>>>>>>>>>>>>> <analyzer type="query"> >>>>>>>>>>>>>>>>>>>>>> <tokenizer class="solr.PatternTokenizerFactory" >>>>>>>>>>>>>>>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/> >>>>>>>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>>>>>>>>>>>>>>> pattern="^[/._:]+" replacement=""/> >>>>>>>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>>>>>>>>>>>>>>> pattern="[/._:]+$" replacement=""/> >>>>>>>>>>>>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>>>>>>>>>>>>>>> pattern="[_]" replacement=" "/> >>>>>>>>>>>>>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" >>>>>>>>>>>>>>>>>>>> max="20"/> >>>>>>>>>>>>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>>>>>>>>>>>>>>>>>>> <filter class="solr.StopFilterFactory" >>>>>>>>>>>>>>>>> ignoreCase="true" >>>>>>>>>>>>>>>>>>>>>> words="stopwords.txt"/> >>>>>>>>>>>>>>>>>>>>>> </analyzer> >>>>>>>>>>>>>>>>>>>>>> </fieldType> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> stopwords.txt >>>>>>>>>>>>>>>>>>>>>> #Standard english stop words taken from Lucene's >>>>> StopAnalyzer >>>>>>>>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>>>>>> b >>>>>>>>>>>>>>>>>>>>>> c >>>>>>>>>>>>>>>>>>>>>> .... >>>>>>>>>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>> are >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Running SolR 6.6.2. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Is there anything I could do to prevent this ? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>>>>>>>>> Guilherme >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> *Paras Lehana* [65871] >>>>>>>>>>>>>>>> Development Engineer, Auto-Suggest, >>>>>>>>>>>>>>>> IndiaMART Intermesh Ltd. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, >>>>>>>>>>>>>>>> Noida, UP, IN - 201303 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Mob.: +91-9560911996 >>>>>>>>>>>>>>>> Work: 01203916600 | Extn: *8173* >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> IMPORTANT: >>>>>>>>>>>>>>>> NEVER share your IndiaMART OTP/ Password with anyone. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> -- >>>>>>>>>>>> Regards, >>>>>>>>>>>> >>>>>>>>>>>> *Paras Lehana* [65871] >>>>>>>>>>>> Development Engineer, Auto-Suggest, >>>>>>>>>>>> IndiaMART Intermesh Ltd. >>>>>>>>>>>> >>>>>>>>>>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, >>>>>>>>>>>> Noida, UP, IN - 201303 >>>>>>>>>>>> >>>>>>>>>>>> Mob.: +91-9560911996 >>>>>>>>>>>> Work: 01203916600 | Extn: *8173* >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> IMPORTANT: >>>>>>>>>>>> NEVER share your IndiaMART OTP/ Password with anyone. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> -- >>>>>>>> Regards, >>>>>>>> >>>>>>>> Paras Lehana [65871] >>>>>>>> Development Engineer, Auto-Suggest, >>>>>>>> IndiaMART Intermesh Ltd. >>>>>>>> >>>>>>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, >>>>>>>> Noida, UP, IN - 201303 >>>>>>>> >>>>>>>> Mob.: +91-9560911996 <tel:+91-9560911996> >>>>>>>> Work: 01203916600 | Extn: 8173 >>>>>>>> >>>>>>>> IMPORTANT: >>>>>>>> NEVER share your IndiaMART OTP/ Password with anyone. >>>>>> >>>>> >>>>> >>> >>> >> > > > -- > -- > Regards, > > *Paras Lehana* [65871] > Development Engineer, Auto-Suggest, > IndiaMART Intermesh Ltd. > > 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, > Noida, UP, IN - 201303 > > Mob.: +91-9560911996 > Work: 01203916600 | Extn: *8173* > > -- > IMPORTANT: > NEVER share your IndiaMART OTP/ Password with anyone.