Also, if helpful, here is our solarconfig.xml https://github.com/VEuPathDB/SolrDeployment/blob/master/configsets/site-search/conf/solrconfig.xml
Thanks again, from a Solr Newbie, steve -----Original Message----- From: Fischer, Stephen <sfisc...@pennmedicine.upenn.edu> Sent: Thursday, February 13, 2020 7:52 AM To: solr-user@lucene.apache.org Subject: RE: [External] Re: wildcards match end-of-word? Folks, I am seeing very strange (bad) wildcard behavior (solr 8). "kinase" finds hits as expected. "kin*ase" and "kin*se" find 0 results. "kinase*" matches only values like "kinase," and "kinase-" but not "kinase" I have done the analysis as Erick suggested (thanks!) but it is not helping me understand why we'd have this problem. I have put together 12 screenshots from the Solr web UI that show in detail: - the queries I ran to get the results above - various analyses trying to understand why - the schema for the fieldType in question https://docs.google.com/presentation/d/10fIAesqkTnvmJBFaerEhnqWhSiaEvVW7u9jE1nX564Q/edit?usp=sharing thanks, steve -----Original Message----- From: Sotiris Fragkiskos <sfra...@gmail.com> Sent: Thursday, February 13, 2020 4:03 AM To: solr-user@lucene.apache.org Subject: [External] Re: wildcards match end-of-word? Hi Erick, thanks very much for this information, it was immensely useful, I always had the same question! I'm now seeing the Analysis page and finally I don't have to rely on an external online stemmer to see what solr *probably* stemmed the term to!! But I still can't make the asterisk and question mark work inside the term, even in the earlier parts of it. e.g. tr?ining I would expect it to match train. But it doesn't. PSF at the end just shows t | ain every line before that actually shows t | aining (ST,SF,SF,LCF,EPF,SKMF) Am I doing something very wrong?? thanks again! Sotiri On Wed, Feb 12, 2020 at 1:44 PM Erick Erickson <erickerick...@gmail.com> wrote: > Steve: > > You _really_ want to get acquainted with the admin UI/Analysis page ;). > Choose a core/collection and you should see the choice. It shows you > exactly what transformations your data goes through. If you hover over > the light gray pairs of letters, you’ll get a tooltip showing you what > part of your analysis chain is responsible for a particular change. I > un-check the “verbose” box 95% of the time BTW. > > The critical bit is that what comes out of the end of the analysis > pipe are the tokens that are actually _in_ the index. From there, > problems like this make more sense. > > My bet is that, as Walter says, you have a stemmer in the analysis > chain and the actual token in the index is “kinas” so of course > “kinase*” won’t be found. By adding OR kinase to the query, that token > is stemmed to “kinas” and matches. > > Also, adding &debug=query to your URL will show you what the query > looks like after parsing and analysis, also a major tool for figuring > out what’s really happening. > > Wildcards are not stemmed, which can lead to surprising results. > There’s no perfect answer here. Let’s claim wildcards _were_ stemmed. > Then you’d have to try to explain why “running*” returned a doc with > only “run” or “runner” or “runs” or... in it, but searching for > “runnin*” did not due the stemmer not recognizing it as a stemmable word. > > Finally, one of my personal hot buttons is wildcards in general. > They’re very often over-used because people are used to simple search > capabilities. > Something about “if your only tool is a hammer, every problem looks > like a nail”. That gets into training users too though... > > Best, > Erick > > > On Feb 11, 2020, at 9:24 PM, Fischer, Stephen < > sfisc...@pennmedicine.upenn.edu> wrote: > > > > Hi, > > > > I am a solr newbie. I was surprised to discover that a search for > kinase* returned fewer results than kinase. > > > > Then I read the wildcard documentation< > https://lucene.apache.org/solr/guide/6_6/the-standard-query-parser.htm > l#TheStandardQueryParser-WildcardSearches>, > and saw why. kinase* will not match the word "kinase". > > > > Our end-users won't expect this behavior. Presumably the solution > > would > be for them (actually us, on their behalf), to use kinase* OR kinase. > > > > But that is kind of a hack. > > > > Is there a way we can configure solr to have wildcards match on > end-of-word? > > > > Thanks, > > Steve > >