Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-20 Thread Guilherme Viteri
Hi, Alright, after trying and trying, I have managed to isolate the fields that are causing the search to fail. Now, all the fields are "" are breaking up my search. I changed the id-StrField to And finally now it

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-18 Thread Guilherme Viteri
Hi, > Have you tried reindexing the documents and compare the results? No issues > if you cannot do that - let's try something else. I was going through the > whole mail and your files. You had said: Yes, but since it hasn't worked as suggested, I kept as you suggested. > As soon as I add dbId or

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-17 Thread Paras Lehana
Hi Guilherme, Have you tried reindexing the documents and compare the results? No issues if you cannot do that - let's try something else. I was going through the whole mail and your files. You had said: As soon as I add dbId or stId (regardless the boost, 1.0 or 100.0), then I > don't get anythi

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-14 Thread Guilherme Viteri
Hi Paras No worries. No I didn’t find anything. This is annoying now... Yes! They do contain dbId. Absolutely all my docs contains dbId and it is actually my key, if you check again the schema.xml Cheers Guilherme > On 15 Nov 2019, at 05:37, Paras Lehana wrote: > >  > Hey Guilherme, > > I w

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-14 Thread Paras Lehana
Hey Guilherme, I was a bit busy for the past few days and couldn't read your mail. So, did you find anything? Anyways, as I had expected, the culprit is definitely among the qfs. Do the documents in concern contain dbId? I suggest you to cross check the fields in your document with those impacting

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-12 Thread Guilherme Viteri
What I can't understand is: I search for the exact term - "Immunoregulatory interactions between a Lymphoid and a non-Lymphoid cell" and If i search "I search for the exact term - Immunoregulatory interactions between a Lymphoid and non-Lymphoid cell" then it works > On 11 Nov 2019, at 12:24,

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-11 Thread Guilherme Viteri
Thanks > Removing stopwords is another story. I'm curious to find the reason > assuming that you keep on using stopwords. In some cases, stopwords are > really necessary. Yes. It always make sense the way we've been using. > If q.alt is giving you responses, it's confirmed that your stopwords filt

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-10 Thread Paras Lehana
Hi So I don't think removing it completely is the way to go from the scenario > we have Removing stopwords is another story. I'm curious to find the reason assuming that you keep on using stopwords. In some cases, stopwords are really necessary. Quite a considerable increase If q.alt is givi

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread David Hastings
I use 3 word shingles with stopwords for my MLT ML trainer that worked pretty well for such a solution, but for a full index the size became prohibitive On Fri, Nov 8, 2019 at 12:13 PM Walter Underwood wrote: > If we had IDF for phrases, they would be super effective. The 2X weight is > a hack t

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread Walter Underwood
If we had IDF for phrases, they would be super effective. The 2X weight is a hack that mostly works. Infoseek had phrase IDF and it was a killer algorithm for relevance. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Nov 8, 2019, at 11:08 AM, David

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread David Hastings
the pf and qf fields are REALLY nice for this On Fri, Nov 8, 2019 at 12:02 PM Walter Underwood wrote: > I always enable phrase searching in edismax for exactly this reason. > > Something like: > >title^16 keywords^8 text^2 > > To deal with concepts in queries, a classifier and/or named e

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread Walter Underwood
I always enable phrase searching in edismax for exactly this reason. Something like: title^16 keywords^8 text^2 To deal with concepts in queries, a classifier and/or named entity extractor can be helpful. If you have a list of concepts (“controlled vocabulary”) that includes “Lamin A”,

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread Walter Underwood
But when you change it to AND, a single misspelling means zero results. That is usually not helpful. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Nov 8, 2019, at 10:43 AM, David Hastings > wrote: > > is your default operator OR? > change it to

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread Erick Erickson
Look at the “mm” parameter, try setting it to 100%. Although that’t not entirely likely to do what you want either since virtually every doc will have “a” in it. But at least you’d get docs that have both terms. you may also be able to search for things like “Lamin A” _only as a phrase_ and hav

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread Guilherme Viteri
OR OR explicit edismax *:* name ... > On 8 Nov 2019, at 16:43, David Hastings wrote: > > is your default operator OR? > change it to AND > > > On Fri, Nov 8, 2019 at

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread David Hastings
is your default operator OR? change it to AND On Fri, Nov 8, 2019 at 11:30 AM Guilherme Viteri wrote: > HI Walter and Paras > > I indexed it removing all the references to StopWordFilter and I went from > 121 results to near 20K as the search term q="Lymphoid and a non-Lymphoid > cell" is match

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-08 Thread Guilherme Viteri
HI Walter and Paras I indexed it removing all the references to StopWordFilter and I went from 121 results to near 20K as the search term q="Lymphoid and a non-Lymphoid cell" is matching entities such as "IFT A" or "Lamin A". So I don't think removing it completely is the way to go from the sc

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-07 Thread Paras Lehana
Hi Guilherme By accident, I ended up querying the using the default handler (/select) > and it worked. You've just found the culprit. Thanks for giving the material I requested. Your analysis chain is working as expected. I don't see any issue in either StopWordFilter or your boosts. I also use

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-07 Thread Walter Underwood
I normally use a weight of 8 for the most important field, like title. Other fields might get a 4 or 2. I add a “pf” field with the weights doubled, so that phrase matches have a higher weight. The weight of 8 comes from experience at Infoseek and Inktomi, two early web search engines. With di

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-07 Thread Guilherme Viteri
Hi Wunder, My indexer takes quite a few hours to be executed I am shortening it to run faster, but I also need to make sure it gives what we are expecting. This implementation's been there for >4y, and massively used. > In your edismax handlers, weights of 20, 50, and 100 are extremely high. I

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-07 Thread David Hastings
Ha, funny enough i still use qf/pf boosts starting at 100 and go down, gives me room to add boosting to more fields but not equal. maybe excessive but haven't noticed a performance issue On Thu, Nov 7, 2019 at 9:44 AM Walter Underwood wrote: > Thanks for posting the files. Looking at schema.xml

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-07 Thread Walter Underwood
Thanks for posting the files. Looking at schema.xml, I see that you still are using StopFilterFactory. The first advice we gave you was to remove that. Remove StopFilterFactory everywhere and reindex. You will continue to have problems matching stopwords until you do that. In your edismax handl

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-07 Thread Guilherme Viteri
Hi Paras, everyone Thank you again for your inputs and suggestions. I sorry to hear you had trouble with the attachments I will host it somewhere and share the links. I don't tweak my index, I get the data from the graph database, create a document as they are and save to solr. So, I am sendin

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-06 Thread Paras Lehana
Hi Guilherme. I am sending they analysis result and the json result as requested. Thanks for the effort. Luckily, I can see your attachments (low quality though). >From the analysis screen, the analysis is working as expected. One of the reasons for query="lymphoid and *a* non-lymphoid cell" no

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-06 Thread Erick Erickson
I don’t see the attachments, maybe I deleted old e-mails or some such. The Apache server is fairly aggressive about stripping attachments though, so it’s also possible they didn’t make it through. > On Nov 6, 2019, at 9:28 AM, Guilherme Viteri wrote: > > Thanks Erick. > >> First, your index a

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-06 Thread Guilherme Viteri
Thanks Erick. > First, your index and analysis chains are considerably different, this can > easily be a source of problems. In particular, using two different tokenizers > is a huge red flag. I _strongly_ recommend against this unless you’re totally > sure you understand the consequences. Addi

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-06 Thread Erick Erickson
First, your index and analysis chains are considerably different, this can easily be a source of problems. In particular, using two different tokenizers is a huge red flag. I _strongly_ recommend against this unless you’re totally sure you understand the consequences. Additionally, your use of t

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-05 Thread Paras Lehana
Hi Walter, The solr.StopFilter removes all tokens that are stopwords. Those words will > not be in the index, so they can never match a query. I think the OP's concern is different results when adding a stopword. I think he's using the filter factory correctly - the query chain includes the filt

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-05 Thread Walter Underwood
No. The solr.StopFilter removes all tokens that are stopwords. Those words will not be in the index, so they can never match a query. 1. Remove the lines with solr.StopFilter from every analysis chain in schema.xml. 2. Reload the collection, restart Solr, or whatever to read the new config. 3.

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-05 Thread Guilherme Viteri
Ok. I am kind a lost now. If I open up the console > analysis and perform it, that's the final result. Your suggestion is: get rid of the in the schema.xml and during index phase replaceAll("in stopwords.txt"," ") then add to solr. Is that correct ? Thanks David > On 5 Nov 2019, at 14:48, D

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-05 Thread David Hastings
no, is still using stopwords and should be removed, in my opinion of course, based on your use case may be different, but i generally axe any reference to them at all On Tue, Nov 5, 2019 at 9:47 AM Guilherme Viteri wrote: > Thanks. > Haven't I done this here ? >positionIncre

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-05 Thread Guilherme Viteri
Thanks. Haven't I done this here ? > On 5 Nov 2019, at 14:15, David Hastings wrote: > > Fwd to another server > > The first thing you should do is remove any reference to stop words and >

Re: When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-05 Thread David Hastings
The first thing you should do is remove any reference to stop words and never use them, then re-index your data and try it again. On Tue, Nov 5, 2019 at 9:14 AM Guilherme Viteri wrote: > Hi, > > I am performing a search to match a name (text_field), however this term > contains 'and' and 'a' and

When search term has two stopwords ('and' and 'a') together, it doesn't work

2019-11-05 Thread Guilherme Viteri
Hi, I am performing a search to match a name (text_field), however this term contains 'and' and 'a' and it doesn't return any records. If i remove 'a' then it works. e.g Search Term: lymphoid and a non-lymphoid cell doesn't work: https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymph