Hi Paras, everyone Thank you again for your inputs and suggestions. I sorry to hear you had trouble with the attachments I will host it somewhere and share the links. I don't tweak my index, I get the data from the graph database, create a document as they are and save to solr.
So, I am sending the new analysis screen querying the way you suggested. Also the results with params and solr query url. During the process of querying what you asked I found something really weird (at least for me). By accident, I ended up querying the using the default handler (/select) and it worked. Then If I use the one I must use, then sadly doesn't work. I am posting both results and I will also post the handlers as well. Here is the link with all the files mentioned before https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0 <https://www.dropbox.com/sh/fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a?dl=0> If the link doesn't work www dot dropbox dot com slash sh slash fymfm1q94zum1lx/AADwU1c9EUf2A4d7FtzSKR54a ? dl equals 0 Thanks > On 7 Nov 2019, at 05:23, Paras Lehana <paras.leh...@indiamart.com> wrote: > > Hi Guilherme. > > I am sending they analysis result and the json result as requested. > > > Thanks for the effort. Luckily, I can see your attachments (low quality > though). > > From the analysis screen, the analysis is working as expected. One of the > reasons for query="lymphoid and *a* non-lymphoid cell" not matching > document containing "Lymphoid and a non-Lymphoid cell" I can initially > think of is: the stopword "a" is probably present in post-analysis either > of query or index. Did you tweak your index time analysis after indexing? > > Do two things: > > 1. Post the analysis screen for and index=*"Immunoregulatory > interactions between a Lymphoid and a non-Lymphoid cell"* and > "query=*"lymphoid > and a non-lymphoid cell"*. Try hosting the image and providing the link > here. > 2. Give the same JSON output as you have sent but this time with > *"echoParams=all"*. Also, post the exact Solr query url. > > > > On Wed, 6 Nov 2019 at 21:07, Erick Erickson <erickerick...@gmail.com> wrote: > >> I don’t see the attachments, maybe I deleted old e-mails or some such. The >> Apache server is fairly aggressive about stripping attachments though, so >> it’s also possible they didn’t make it through. >> >>> On Nov 6, 2019, at 9:28 AM, Guilherme Viteri <gvit...@ebi.ac.uk> wrote: >>> >>> Thanks Erick. >>> >>>> First, your index and analysis chains are considerably different, this >> can easily be a source of problems. In particular, using two different >> tokenizers is a huge red flag. I _strongly_ recommend against this unless >> you’re totally sure you understand the consequences. Additionally, your use >> of the length filter is suspicious, especially since your problem statement >> is about the addition of a single letter term and the min length allowed on >> that filter is 2. That said, it’s reasonable to suppose that the ’a’ is >> filtered out in both cases, but maybe you’ve found something odd about the >> interactions. >>> I will investigate the min length and post the results later. >>> >>>> Second, I have no idea what this will do. Are the equal signs typos? >> Used by custom code? >>> This the url in my application, not solr params. That's the query string. >>> >>>> What does “species=“ do? That’s not Solr syntax, so it’s likely that >> all the params with an equal-sign are totally ignored unless it’s just a >> typo. >>> This is part of the application. Species will be used later on in solr >> to filter out the result. That's not solr. That my app params. >>> >>>> Third, the easiest way to see what’s happening under the covers is to >> add “&debug=true” to the query and look at the parsed query. Ignore all the >> relevance calculations for the nonce, or specify “&debug=query” to skip >> that part. >>> The two json files i've sent, they are debugQuery=on and the explain tag >> is present. >>> I will try the searching the way you mentioned. >>> >>> Thank for your inputs >>> >>> Guilherme >>> >>>> On 6 Nov 2019, at 14:14, Erick Erickson <erickerick...@gmail.com> >> wrote: >>>> >>>> Fwd to another server >>>> >>>> First, your index and analysis chains are considerably different, this >> can easily be a source of problems. In particular, using two different >> tokenizers is a huge red flag. I _strongly_ recommend against this unless >> you’re totally sure you understand the consequences. Additionally, your use >> of the length filter is suspicious, especially since your problem statement >> is about the addition of a single letter term and the min length allowed on >> that filter is 2. That said, it’s reasonable to suppose that the ’a’ is >> filtered out in both cases, but maybe you’ve found something odd about the >> interactions. >>>> >>>> Second, I have no idea what this will do. Are the equal signs typos? >> Used by custom code? >>>> >>>>>> >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>> >>>> What does “species=“ do? That’s not Solr syntax, so it’s likely that >> all the params with an equal-sign are totally ignored unless it’s just a >> typo. >>>> >>>> Third, the easiest way to see what’s happening under the covers is to >> add “&debug=true” to the query and look at the parsed query. Ignore all the >> relevance calculations for the nonce, or specify “&debug=query” to skip >> that part. >>>> >>>> 90% + of the time, the question “why didn’t this query do what I >> expect” is answered by looking at the “&debug=query” output and the >> analysis page in the admin UI. NOTE: for the analysis page be sure to look >> at _both_ the query and index output. Also, and very important about the >> analysis page (and this is confusing) is that this _assumes_ that what you >> put in the text boxes have made it through the query parser intact and is >> analyzed by the field selected. Consider the search "q=field:word1 word2". >> Now you type “word1 word2” into the analysis text box and it looks like >> what you expect. That’s misleading because the query is _parsed_ as >> "field:word1 default_search_field:word2”. This is where “&debug=query” >> helps. >>>> >>>> Best, >>>> Erick >>>> >>>>> On Nov 6, 2019, at 2:36 AM, Paras Lehana <paras.leh...@indiamart.com> >> wrote: >>>>> >>>>> Hi Walter, >>>>> >>>>> The solr.StopFilter removes all tokens that are stopwords. Those words >> will >>>>>> not be in the index, so they can never match a query. >>>>> >>>>> >>>>> I think the OP's concern is different results when adding a stopword. I >>>>> think he's using the filter factory correctly - the query chain >> includes >>>>> the filter as well so it should remove "a" while querying. >>>>> >>>>> *@Guilherme*, please post results for both the query, the document in >>>>> result you are concerned about and post full result of analysis screen >> (for >>>>> both query and index). >>>>> >>>>> On Tue, 5 Nov 2019 at 21:38, Walter Underwood <wun...@wunderwood.org> >> wrote: >>>>> >>>>>> No. >>>>>> >>>>>> The solr.StopFilter removes all tokens that are stopwords. Those words >>>>>> will not be in the index, so they can never match a query. >>>>>> >>>>>> 1. Remove the lines with solr.StopFilter from every analysis chain in >>>>>> schema.xml. >>>>>> 2. Reload the collection, restart Solr, or whatever to read the new >> config. >>>>>> 3. Reindex all of the documents. >>>>>> >>>>>> When indexed with the new analysis chain, the stopwords will not be >>>>>> removed and they will be searchable. >>>>>> >>>>>> wunder >>>>>> Walter Underwood >>>>>> wun...@wunderwood.org >>>>>> http://observer.wunderwood.org/ (my blog) >>>>>> >>>>>>> On Nov 5, 2019, at 8:56 AM, Guilherme Viteri <gvit...@ebi.ac.uk> >> wrote: >>>>>>> >>>>>>> Ok. I am kind a lost now. >>>>>>> If I open up the console > analysis and perform it, that's the final >>>>>> result. >>>>>>> <Screenshot 2019-11-05 at 14.54.16.png> >>>>>>> >>>>>>> Your suggestion is: get rid of the <filter stopword.txt> in the >>>>>> schema.xml and during index phase replaceAll("in stopwords.txt"," ") >> then >>>>>> add to solr. Is that correct ? >>>>>>> >>>>>>> Thanks David >>>>>>> >>>>>>>> On 5 Nov 2019, at 14:48, David Hastings < >> hastings.recurs...@gmail.com >>>>>> <mailto:hastings.recurs...@gmail.com>> wrote: >>>>>>>> >>>>>>>> Fwd to another server >>>>>>>> >>>>>>>> no, >>>>>>>> <filter class="solr.StopFilterFactory" ignoreCase="true" >>>>>>>> words="stopwords.txt"/> >>>>>>>> >>>>>>>> is still using stopwords and should be removed, in my opinion of >> course, >>>>>>>> based on your use case may be different, but i generally axe any >>>>>> reference >>>>>>>> to them at all >>>>>>>> >>>>>>>> On Tue, Nov 5, 2019 at 9:47 AM Guilherme Viteri <gvit...@ebi.ac.uk >>>>>> <mailto:gvit...@ebi.ac.uk>> wrote: >>>>>>>> >>>>>>>>> Thanks. >>>>>>>>> Haven't I done this here ? >>>>>>>>> <fieldType name="text_field" class="solr.TextField" >>>>>>>>> positionIncrementGap="100" omitNorms="false" > >>>>>>>>> <analyzer type="index"> >>>>>>>>> <tokenizer class="solr.StandardTokenizerFactory"/> >>>>>>>>> <filter class="solr.ClassicFilterFactory"/> >>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" >>>>>> max="20"/> >>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>>>>>> <filter class="solr.StopFilterFactory" ignoreCase="true" >>>>>>>>> words="stopwords.txt"/> >>>>>>>>> </analyzer> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On 5 Nov 2019, at 14:15, David Hastings < >> hastings.recurs...@gmail.com >>>>>> <mailto:hastings.recurs...@gmail.com>> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Fwd to another server >>>>>>>>>> >>>>>>>>>> The first thing you should do is remove any reference to stop >> words >>>>>> and >>>>>>>>>> never use them, then re-index your data and try it again. >>>>>>>>>> >>>>>>>>>> On Tue, Nov 5, 2019 at 9:14 AM Guilherme Viteri < >> gvit...@ebi.ac.uk >>>>>> <mailto:gvit...@ebi.ac.uk>> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I am performing a search to match a name (text_field), however >> this >>>>>> term >>>>>>>>>>> contains 'and' and 'a' and it doesn't return any records. If i >> remove >>>>>>>>> 'a' >>>>>>>>>>> then it works. >>>>>>>>>>> e.g >>>>>>>>>>> Search Term: lymphoid and a non-lymphoid cell >>>>>>>>>>> doesn't work: >>>>>>>>>>> >>>>>>>>> >>>>>> >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>>>> < >>>>>> >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>>>>> >>>>>>>>>>> < >>>>>>>>>>> >>>>>>>>> >>>>>> >> https://dev.reactome.org/content/query?q=lymphoid+and+a+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Search term: lymphoid and non-lymphoid cell >>>>>>>>>>> works: >>>>>>>>>>> >>>>>>>>> >>>>>> >> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>>>>>>>>> < >>>>>>>>>>> >>>>>>>>> >>>>>> >> https://dev.reactome.org/content/query?q=lymphoid+and+non-lymphoid+cell&species=Homo+sapiens&species=Entries+without+species&cluster=true >>>>>>>>>>>> >>>>>>>>>>> interested in the first result >>>>>>>>>>> >>>>>>>>>>> schema.xml >>>>>>>>>>> <field name="name" type="text_field" >>>>>>>>>>> indexed="true" stored="true" omitNorms="false" >> required="true" >>>>>>>>>>> multiValued="false"/> >>>>>>>>>>> >>>>>>>>>>> <analyzer type="query"> >>>>>>>>>>> <tokenizer class="solr.PatternTokenizerFactory" >>>>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/> >>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>>>> pattern="^[/._:]+" replacement=""/> >>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>>>> pattern="[/._:]+$" replacement=""/> >>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>>>> pattern="[_]" replacement=" "/> >>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" >>>>>>>>> max="20"/> >>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>>>>>>>> <filter class="solr.StopFilterFactory" >>>>>> ignoreCase="true" >>>>>>>>>>> words="stopwords.txt"/> >>>>>>>>>>> </analyzer> >>>>>>>>>>> >>>>>>>>>>> <fieldType name="text_field" class="solr.TextField" >>>>>>>>>>> positionIncrementGap="100" omitNorms="false" > >>>>>>>>>>> <analyzer type="index"> >>>>>>>>>>> <tokenizer class="solr.StandardTokenizerFactory"/> >>>>>>>>>>> <filter class="solr.ClassicFilterFactory"/> >>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" >>>>>>>>> max="20"/> >>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>>>>>>>> <filter class="solr.StopFilterFactory" >>>>>> ignoreCase="true" >>>>>>>>>>> words="stopwords.txt"/> >>>>>>>>>>> </analyzer> >>>>>>>>>>> <analyzer type="query"> >>>>>>>>>>> <tokenizer class="solr.PatternTokenizerFactory" >>>>>>>>>>> pattern="[^a-zA-Z0-9/._:]"/> >>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>>>> pattern="^[/._:]+" replacement=""/> >>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>>>> pattern="[/._:]+$" replacement=""/> >>>>>>>>>>> <filter class="solr.PatternReplaceFilterFactory" >>>>>>>>>>> pattern="[_]" replacement=" "/> >>>>>>>>>>> <filter class="solr.LengthFilterFactory" min="2" >>>>>>>>> max="20"/> >>>>>>>>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>>>>>>>> <filter class="solr.StopFilterFactory" >>>>>> ignoreCase="true" >>>>>>>>>>> words="stopwords.txt"/> >>>>>>>>>>> </analyzer> >>>>>>>>>>> </fieldType> >>>>>>>>>>> >>>>>>>>>>> stopwords.txt >>>>>>>>>>> #Standard english stop words taken from Lucene's StopAnalyzer >>>>>>>>>>> a >>>>>>>>>>> b >>>>>>>>>>> c >>>>>>>>>>> .... >>>>>>>>>>> an >>>>>>>>>>> and >>>>>>>>>>> are >>>>>>>>>>> >>>>>>>>>>> Running SolR 6.6.2. >>>>>>>>>>> >>>>>>>>>>> Is there anything I could do to prevent this ? >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> Guilherme >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> -- >>>>> -- >>>>> Regards, >>>>> >>>>> *Paras Lehana* [65871] >>>>> Development Engineer, Auto-Suggest, >>>>> IndiaMART Intermesh Ltd. >>>>> >>>>> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, >>>>> Noida, UP, IN - 201303 >>>>> >>>>> Mob.: +91-9560911996 >>>>> Work: 01203916600 | Extn: *8173* >>>>> >>>>> -- >>>>> IMPORTANT: >>>>> NEVER share your IndiaMART OTP/ Password with anyone. >>>> >>> >> >> > > -- > -- > Regards, > > *Paras Lehana* [65871] > Development Engineer, Auto-Suggest, > IndiaMART Intermesh Ltd. > > 8th Floor, Tower A, Advant-Navis Business Park, Sector 142, > Noida, UP, IN - 201303 > > Mob.: +91-9560911996 > Work: 01203916600 | Extn: *8173* > > -- > IMPORTANT: > NEVER share your IndiaMART OTP/ Password with anyone.