Ha, funny enough i still use qf/pf boosts starting at 100 and go down, gives me room to add boosting to more fields but not equal. maybe excessive but haven't noticed a performance issue
On Thu, Nov 7, 2019 at 9:44 AM Walter Underwood <wun...@wunderwood.org> wrote: > Thanks for posting the files. Looking at schema.xml, I see that you still > are using StopFilterFactory. The first advice we gave you was to remove > that. > > Remove StopFilterFactory everywhere and reindex. > > You will continue to have problems matching stopwords until you do that. > > In your edismax handlers, weights of 20, 50, and 100 are extremely high. I > don’t think I’ve ever used a weight higher than 16 in a dozen years of > configuring Solr. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Nov 7, 2019, at 6:56 AM, Guilherme Viteri <gvit...@ebi.ac.uk> wrote: > > > > Hi Paras, everyone > > > > Thank you again for your inputs and suggestions. I sorry to hear you had > trouble with the attachments I will host it somewhere and share the links. > > I don't tweak my index, I get the data from the graph database, create a > document as they are and save to solr. > > > > So, I am sending the new analysis screen querying the way you suggested. > Also the results with params and solr query url. > > > > During the process of querying what you asked I found something really > weird (at least for me). By accident, I ended up querying the using the > default handler (/select) and it worked. Then If I use the one I must use, > then sadly doesn't work. I am posting both results and I will also post the > handlers as well. > > > > Here is the link with all the files mentioned before > > > https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0 > <https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0 > > > > If the link doesn't work www dot dropbox dot com slash sh slash > fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a ? dl equals 0 > > > > Thanks > > > >> On 7 Nov 2019, at 05:23, Paras Lehana <paras.leh...@indiamart.com> > wrote: > >> > >> Hi Guilherme. > >> > >> I am sending they analysis result and the json result as requested. > >> > >> > >> Thanks for the effort. Luckily, I can see your attachments (low quality > >> though). > >> > >> From the analysis screen, the analysis is working as expected. One of > the > >> reasons for query="lymphoid and *a* non-lymphoid cell" not matching > >> document containing "Lymphoid and a non-Lymphoid cell" I can initially > >> think of is: the stopword "a" is probably present in post-analysis > either > >> of query or index. Did you tweak your index time analysis after > indexing? > >> > >> Do two things: > >> > >> 1. Post the analysis screen for and index=*"Immunoregulatory > >> interactions between a Lymphoid and a non-Lymphoid cell"* and > >> "query=*"lymphoid > >> and a non-lymphoid cell"*. Try hosting the image and providing the link > >> here. > >> 2. Give the same JSON output as you have sent but this time with > >> *"echoParams=all"*. Also, post the exact Solr query url. > >> > >> > >> > >> On Wed, 6 Nov 2019 at 21:07, Erick Erickson <erickerick...@gmail.com> > wrote: > >> > >>> I don’t see the attachments, maybe I deleted old e-mails or some such. > The > >>> Apache server is fairly aggressive about stripping attachments though, > so > >>> it’s also possible they didn’t make it through. > >>> > >>>> On Nov 6, 2019, at 9:28 AM, Guilherme Viteri <gvit...@ebi.ac.uk> > wrote: > >>>> > >>>> Thanks Erick. > >>>> > >>>>> First, your index and analysis chains are considerably different, > this > >>> can easily be a source of problems. In particular, using two different > >>> tokenizers is a huge red flag. I _strongly_ recommend against this > unless > >>> you’re totally sure you understand the consequences. Additionally, > your use > >>> of the length filter is suspicious, especially since your problem > statement > >>> is about the addition of a single letter term and the min length > allowed on > >>> that filter is 2. That said, it’s reasonable to suppose that the ’a’ is > >>> filtered out in both cases, but maybe you’ve found something odd about > the > >>> interactions. > >>>> I will investigate the min length and post the results later. > >>>> > >>>>> Second, I have no idea what this will do. Are the equal signs typos? > >>> Used by custom code? > >>>> This the url in my application, not solr params. That's the query > string. > >>>> > >>>>> What does “species=“ do? That’s not Solr syntax, so it’s likely that > >>> all the params with an equal-sign are totally ignored unless it’s just > a > >>> typo. > >>>> This is part of the application. Species will be used later on in solr > >>> to filter out the result. That's not solr. That my app params. > >>>> > >>>>> Third, the easiest way to see what’s happening under the covers is to > >>> add “&debug=true” to the query and look at the parsed query. Ignore > all the > >>> relevance calculations for the nonce, or specify “&debug=query” to skip > >>> that part. > >>>> The two json files i've sent, they are debugQuery=on and the explain > tag > >>> is present. > >>>> I will try the searching the way you mentioned. > >>>> > >>>> Thank for your inputs > >>>> > >>>> Guilherme > >>>> > >>>>> On 6 Nov 2019, at 14:14, Erick Erickson <erickerick...@gmail.com> > >>> wrote: > >>>>> > >>>>> Fwd to another server > >>>>> > >>>>> First, your index and analysis chains are considerably different, > this > >>> can easily be a source of problems. In particular, using two different > >>> tokenizers is a huge red flag. I _strongly_ recommend against this > unless > >>> you’re totally sure you understand the consequences. Additionally, > your use > >>> of the length filter is suspicious, especially since your problem > statement > >>> is about the addition of a single letter term and the min length > allowed on > >>> that filter is 2. That said, it’s reasonable to suppose that the ’a’ is > >>> filtered out in both cases, but maybe you’ve found something odd about > the > >>> interactions. > >>>>> > >>>>> Second, I have no idea what this will do. Are the equal signs typos? > >>> Used by custom code? > >>>>> > >>>>>>> > >>> > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > >>>>> > >>>>> What does “species=“ do? That’s not Solr syntax, so it’s likely that > >>> all the params with an equal-sign are totally ignored unless it’s just > a > >>> typo. > >>>>> > >>>>> Third, the easiest way to see what’s happening under the covers is to > >>> add “&debug=true” to the query and look at the parsed query. Ignore > all the > >>> relevance calculations for the nonce, or specify “&debug=query” to skip > >>> that part. > >>>>> > >>>>> 90% + of the time, the question “why didn’t this query do what I > >>> expect” is answered by looking at the “&debug=query” output and the > >>> analysis page in the admin UI. NOTE: for the analysis page be sure to > look > >>> at _both_ the query and index output. Also, and very important about > the > >>> analysis page (and this is confusing) is that this _assumes_ that what > you > >>> put in the text boxes have made it through the query parser intact and > is > >>> analyzed by the field selected. Consider the search "q=field:word1 > word2". > >>> Now you type “word1 word2” into the analysis text box and it looks like > >>> what you expect. That’s misleading because the query is _parsed_ as > >>> "field:word1 default_search_field:word2”. This is where “&debug=query” > >>> helps. > >>>>> > >>>>> Best, > >>>>> Erick > >>>>> > >>>>>> On Nov 6, 2019, at 2:36 AM, Paras Lehana < > paras.leh...@indiamart.com> > >>> wrote: > >>>>>> > >>>>>> Hi Walter, > >>>>>> > >>>>>> The solr.StopFilter removes all tokens that are stopwords. Those > words > >>> will > >>>>>>> not be in the index, so they can never match a query. > >>>>>> > >>>>>> > >>>>>> I think the OP's concern is different results when adding a > stopword. I > >>>>>> think he's using the filter factory correctly - the query chain > >>> includes > >>>>>> the filter as well so it should remove "a" while querying. > >>>>>> > >>>>>> *@Guilherme*, please post results for both the query, the document > in > >>>>>> result you are concerned about and post full result of analysis > screen > >>> (for > >>>>>> both query and index). > >>>>>> > >>>>>> On Tue, 5 Nov 2019 at 21:38, Walter Underwood < > wun...@wunderwood.org> > >>> wrote: > >>>>>> > >>>>>>> No. > >>>>>>> > >>>>>>> The solr.StopFilter removes all tokens that are stopwords. Those > words > >>>>>>> will not be in the index, so they can never match a query. > >>>>>>> > >>>>>>> 1. Remove the lines with solr.StopFilter from every analysis chain > in > >>>>>>> schema.xml. > >>>>>>> 2. Reload the collection, restart Solr, or whatever to read the new > >>> config. > >>>>>>> 3. Reindex all of the documents. > >>>>>>> > >>>>>>> When indexed with the new analysis chain, the stopwords will not be > >>>>>>> removed and they will be searchable. > >>>>>>> > >>>>>>> wunder > >>>>>>> Walter Underwood > >>>>>>> wun...@wunderwood.org > >>>>>>> http://observer.wunderwood.org/ (my blog) > >>>>>>> > >>>>>>>> On Nov 5, 2019, at 8:56 AM, Guilherme Viteri <gvit...@ebi.ac.uk> > >>> wrote: > >>>>>>>> > >>>>>>>> Ok. I am kind a lost now. > >>>>>>>> If I open up the console > analysis and perform it, that's the > final > >>>>>>> result. > >>>>>>>> <Screenshot 2019-11-05 at 14.54.16.png> > >>>>>>>> > >>>>>>>> Your suggestion is: get rid of the <filter stopword.txt> in the > >>>>>>> schema.xml and during index phase replaceAll("in stopwords.txt"," > ") > >>> then > >>>>>>> add to solr. Is that correct ? > >>>>>>>> > >>>>>>>> Thanks David > >>>>>>>> > >>>>>>>>> On 5 Nov 2019, at 14:48, David Hastings < > >>> hastings.recurs...@gmail.com > >>>>>>> <mailto:hastings.recurs...@gmail.com>> wrote: > >>>>>>>>> > >>>>>>>>> Fwd to another server > >>>>>>>>> > >>>>>>>>> no, > >>>>>>>>> <filter class="solr.StopFilterFactory" ignoreCase="true" > >>>>>>>>> words="stopwords.txt"/> > >>>>>>>>> > >>>>>>>>> is still using stopwords and should be removed, in my opinion of > >>> course, > >>>>>>>>> based on your use case may be different, but i generally axe any > >>>>>>> reference > >>>>>>>>> to them at all > >>>>>>>>> > >>>>>>>>> On Tue, Nov 5, 2019 at 9:47 AM Guilherme Viteri < > gvit...@ebi.ac.uk > >>>>>>> <mailto:gvit...@ebi.ac.uk>> wrote: > >>>>>>>>> > >>>>>>>>>> Thanks. > >>>>>>>>>> Haven't I done this here ? > >>>>>>>>>> <fieldType name="text_field" class="solr.TextField" > >>>>>>>>>> positionIncrementGap="100" omitNorms="false" > > >>>>>>>>>> <analyzer type="index"> > >>>>>>>>>> <tokenizer class="solr.StandardTokenizerFactory"/> > >>>>>>>>>> <filter class="solr.ClassicFilterFactory"/> > >>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" > >>>>>>> max="20"/> > >>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> > >>>>>>>>>> <filter class="solr.StopFilterFactory" > ignoreCase="true" > >>>>>>>>>> words="stopwords.txt"/> > >>>>>>>>>> </analyzer> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> On 5 Nov 2019, at 14:15, David Hastings < > >>> hastings.recurs...@gmail.com > >>>>>>> <mailto:hastings.recurs...@gmail.com>> > >>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Fwd to another server > >>>>>>>>>>> > >>>>>>>>>>> The first thing you should do is remove any reference to stop > >>> words > >>>>>>> and > >>>>>>>>>>> never use them, then re-index your data and try it again. > >>>>>>>>>>> > >>>>>>>>>>> On Tue, Nov 5, 2019 at 9:14 AM Guilherme Viteri < > >>> gvit...@ebi.ac.uk > >>>>>>> <mailto:gvit...@ebi.ac.uk>> > >>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Hi, > >>>>>>>>>>>> > >>>>>>>>>>>> I am performing a search to match a name (text_field), however > >>> this > >>>>>>> term > >>>>>>>>>>>> contains 'and' and 'a' and it doesn't return any records. If i > >>> remove > >>>>>>>>>> 'a' > >>>>>>>>>>>> then it works. > >>>>>>>>>>>> e.g > >>>>>>>>>>>> Search Term: lymphoid and a non-lymphoid cell > >>>>>>>>>>>> doesn't work: > >>>>>>>>>>>> > >>>>>>>>>> > >>>>>>> > >>> > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > >>>>>>> < > >>>>>>> > >>> > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > >>>>>>>> > >>>>>>>>>>>> < > >>>>>>>>>>>> > >>>>>>>>>> > >>>>>>> > >>> > https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Search term: lymphoid and non-lymphoid cell > >>>>>>>>>>>> works: > >>>>>>>>>>>> > >>>>>>>>>> > >>>>>>> > >>> > https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > >>>>>>>>>>>> < > >>>>>>>>>>>> > >>>>>>>>>> > >>>>>>> > >>> > https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true > >>>>>>>>>>>>> > >>>>>>>>>>>> interested in the first result > >>>>>>>>>>>> > >>>>>>>>>>>> schema.xml > >>>>>>>>>>>> <field name="name" type="text_field" > >>>>>>>>>>>> indexed="true" stored="true" omitNorms="false" > >>> required="true" > >>>>>>>>>>>> multiValued="false"/> > >>>>>>>>>>>> > >>>>>>>>>>>> <analyzer type="query"> > >>>>>>>>>>>> <tokenizer class="solr.PatternTokenizerFactory" > >>>>>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/> > >>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" > >>>>>>>>>>>> pattern="^[/._:]+" replacement=""/> > >>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" > >>>>>>>>>>>> pattern="[/._:]+$" replacement=""/> > >>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" > >>>>>>>>>>>> pattern="[_]" replacement=" "/> > >>>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" > >>>>>>>>>> max="20"/> > >>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> > >>>>>>>>>>>> <filter class="solr.StopFilterFactory" > >>>>>>> ignoreCase="true" > >>>>>>>>>>>> words="stopwords.txt"/> > >>>>>>>>>>>> </analyzer> > >>>>>>>>>>>> > >>>>>>>>>>>> <fieldType name="text_field" class="solr.TextField" > >>>>>>>>>>>> positionIncrementGap="100" omitNorms="false" > > >>>>>>>>>>>> <analyzer type="index"> > >>>>>>>>>>>> <tokenizer class="solr.StandardTokenizerFactory"/> > >>>>>>>>>>>> <filter class="solr.ClassicFilterFactory"/> > >>>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" > >>>>>>>>>> max="20"/> > >>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> > >>>>>>>>>>>> <filter class="solr.StopFilterFactory" > >>>>>>> ignoreCase="true" > >>>>>>>>>>>> words="stopwords.txt"/> > >>>>>>>>>>>> </analyzer> > >>>>>>>>>>>> <analyzer type="query"> > >>>>>>>>>>>> <tokenizer class="solr.PatternTokenizerFactory" > >>>>>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/> > >>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" > >>>>>>>>>>>> pattern="^[/._:]+" replacement=""/> > >>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" > >>>>>>>>>>>> pattern="[/._:]+$" replacement=""/> > >>>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" > >>>>>>>>>>>> pattern="[_]" replacement=" "/> > >>>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" > >>>>>>>>>> max="20"/> > >>>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> > >>>>>>>>>>>> <filter class="solr.StopFilterFactory" > >>>>>>> ignoreCase="true" > >>>>>>>>>>>> words="stopwords.txt"/> > >>>>>>>>>>>> </analyzer> > >>>>>>>>>>>> </fieldType> > >>>>>>>>>>>> > >>>>>>>>>>>> stopwords.txt > >>>>>>>>>>>> #Standard english stop words taken from Lucene's StopAnalyzer > >>>>>>>>>>>> a > >>>>>>>>>>>> b > >>>>>>>>>>>> c > >>>>>>>>>>>> .... > >>>>>>>>>>>> an > >>>>>>>>>>>> and > >>>>>>>>>>>> are > >>>>>>>>>>>> > >>>>>>>>>>>> Running SolR 6.6.2. > >>>>>>>>>>>> > >>>>>>>>>>>> Is there anything I could do to prevent this ? > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks > >>>>>>>>>>>> Guilherme > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> -- > >>>>>> -- > >>>>>> Regards, > >>>>>> > >>>>>> *Paras Lehana* [65871] > >>>>>> Development Engineer, Auto-Suggest, > >>>>>> IndiaMART Intermesh Ltd. > >>>>>> > >>>>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, > >>>>>> Noida, UP, IN - 201303 > >>>>>> > >>>>>> Mob.: +91-9560911996 > >>>>>> Work: 01203916600 | Extn: *8173* > >>>>>> > >>>>>> -- > >>>>>> IMPORTANT: > >>>>>> NEVER share your IndiaMART OTP/ Password with anyone. > >>>>> > >>>> > >>> > >>> > >> > >> -- > >> -- > >> Regards, > >> > >> *Paras Lehana* [65871] > >> Development Engineer, Auto-Suggest, > >> IndiaMART Intermesh Ltd. > >> > >> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, > >> Noida, UP, IN - 201303 > >> > >> Mob.: +91-9560911996 > >> Work: 01203916600 | Extn: *8173* > >> > >> -- > >> IMPORTANT: > >> NEVER share your IndiaMART OTP/ Password with anyone. > > > >